Show HN: Imagent – agentic image/video/speech generation
Imagent is an open-source tool that integrates image, video, and speech generation into AI agent workflows. It provides a unified CLI interface supporting multiple AI providers (e.g., OpenAI, Google, ElevenLabs) and manages generated assets in a local library for reuse.
Notifications You must be signed in to change notification settings
Fork 0
Star 8
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
126 Commits
126 Commits
.github
.github
apps
apps
assets
assets
docs
docs
packages
packages
scripts
scripts
skills/imagent
skills/imagent
.env.example
.env.example
.gitignore
.gitignore
CHANGELOG.md
CHANGELOG.md
CONTRIBUTING.md
CONTRIBUTING.md
LICENSE
LICENSE
README.md
README.md
README.zh-CN.md
README.zh-CN.md
architecture.md
architecture.md
biome.jsonc
biome.jsonc
bun.lock
bun.lock
package.json
package.json
tsconfig.base.json
tsconfig.base.json
turbo.json
turbo.json
vitest.config.ts
vitest.config.ts
Repository files navigation
Most agents can reason and write code, but they can't create images, video, or audio — and the ad-hoc scripts people wire up for it are throwaway, provider-locked, and forget every asset the moment they finish. Imagent solves three problems at once:
What it gives you
Generation as an agent capability The bundled skill lets any compatible agent call the imagent CLI to generate images, video, and speech as a native step in its workflow — no bespoke per-tool integration, no one-off glue code.
One interface, every provider and model OpenAI, Azure OpenAI, Google Imagen/Gemini, Flux/BFL, BytePlus / 火山引擎 Seedream/Seedance, xAI Grok, MiniMax TTS, and ElevenLabs TTS sit behind a single, consistent interface. Users and agents swap providers or models without rewriting prompts, parameters, or calling conventions.
Assets that outlive the prompt Every generated image, video, and clip — plus reusable characters, objects, backgrounds, styles, and references — is captured in a managed local library. Curate, search, and reuse outputs across projects instead of regenerating them from scratch.
Quick start
Install the CLI:
npm install -g @imagent/cli imagent doctor
Install the desktop app:
Download the macOS or Windows installer from the latest release.
The desktop app is not signed yet. On macOS, remove quarantine before opening the app:
xattr -cr Imagent.app
On Windows, bypass the SmartScreen warning by choosing More info → Run anyway.
Generate with defaults:
imagent image generate "minimal product photo of a ceramic mug" imagent video generate "a slow dolly shot through a rainy alley" imagent speech synthesize "Welcome to imagent, your local creative workspace."
Need setup details, provider configuration, desktop installation, or troubleshooting? Visit the documentation site:
https://unliftedq.github.io/imagent/docs
Agent skill integration
The repository includes a ready-to-install skill at skills/imagent. Install it into any compatible agent runtime, then make sure the imagent CLI is available on that agent's PATH.
npx skills add unliftedq/imagent
Use the same install flow for Claude Code, Codex, OpenClaw, Hermes, or other compatible agents. After installation, the agent can run imagent doctor to decide whether to use the shared local gallery and configured providers, or fall back to another generation tool when imagent is not configured.
Typical workflows
Give a coding or automation agent the ability to produce visual and audio assets mid-task, through one audited CLI.
Switch between providers and models for the same prompt without changing how you call them.
Build up a reusable library of characters, styles, and reference assets that compounds across projects.
Curate and revisit everything an agent generated, instead of losing it once the script exits.
Combine terminal automation with desktop-based review and curation over a shared local workspace.
Project structure
imagent/ apps/ desktop/ # @imagent/studio, Electron desktop application cli/ # @imagent/cli, command-line interface packages/ core/ # domain types, ports, and job runtime logic providers/ # provider adapters and model catalog persistence/ # SQLite, migrations, repositories, file and thumbnail handling config/ # configuration and secret management ipc/ # desktop IPC contract ui/ # shared UI components
Current status
imagent remains in an early stage. Data structures, packaging, and parts of the feature set may continue to evolve. The current version does not include telemetry, automatic updates, cloud sync, or account systems. Desktop packages are unsigned, so macOS may require removing quarantine and Windows may show a SmartScreen warning on first launch.
License
By contributing to imagent, you agree that your contributions will be licensed under the project's Apache License 2.0. You also confirm that you have the right to submit the work under that license.
Acknowledgements
Phosphor icons, Radix UI, Tailwind CSS v4, Bun, Turborepo, Vite, Electron, Commander, zod, zustand, better-sqlite3, sharp, ffmpeg-static, @dnd-kit.
About
Imagine + Agent
unliftedq.github.io/imagent/
Topics
image
video
generative-art
generative-ai
Resources
Readme
License
Apache-2.0 license
Contributing
Contributing
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
8 stars
Watchers
0 watching
Forks
0 forks
Report repository
Releases 11
v0.3.1
Latest
Jun 13, 2026
+ 10 releases
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
TypeScript 96.5%
CSS 2.2%
JavaScript 1.2%
HTML 0.1%