AI News HubLIVE
In-site rewrite3 min read

Show HN: Imagent – agentic image/video/speech generation

Imagent is an open-source tool that integrates image, video, and speech generation into AI agent workflows. It provides a unified CLI interface supporting multiple AI providers (e.g., OpenAI, Google, ElevenLabs) and manages generated assets in a local library for reuse.

SourceHacker News AIAuthor: unliftedq

Notifications You must be signed in to change notification settings

Fork 0

Star 8

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

126 Commits

126 Commits

.github

.github

apps

apps

assets

assets

docs

docs

packages

packages

scripts

scripts

skills/imagent

skills/imagent

.env.example

.env.example

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

README.zh-CN.md

README.zh-CN.md

architecture.md

architecture.md

biome.jsonc

biome.jsonc

bun.lock

bun.lock

package.json

package.json

tsconfig.base.json

tsconfig.base.json

turbo.json

turbo.json

vitest.config.ts

vitest.config.ts

Repository files navigation

Most agents can reason and write code, but they can't create images, video, or audio — and the ad-hoc scripts people wire up for it are throwaway, provider-locked, and forget every asset the moment they finish. Imagent solves three problems at once:

What it gives you

Generation as an agent capability The bundled skill lets any compatible agent call the imagent CLI to generate images, video, and speech as a native step in its workflow — no bespoke per-tool integration, no one-off glue code.

One interface, every provider and model OpenAI, Azure OpenAI, Google Imagen/Gemini, Flux/BFL, BytePlus / 火山引擎 Seedream/Seedance, xAI Grok, MiniMax TTS, and ElevenLabs TTS sit behind a single, consistent interface. Users and agents swap providers or models without rewriting prompts, parameters, or calling conventions.

Assets that outlive the prompt Every generated image, video, and clip — plus reusable characters, objects, backgrounds, styles, and references — is captured in a managed local library. Curate, search, and reuse outputs across projects instead of regenerating them from scratch.

Quick start

Install the CLI:

npm install -g @imagent/cli imagent doctor

Install the desktop app:

Download the macOS or Windows installer from the latest release.

The desktop app is not signed yet. On macOS, remove quarantine before opening the app:

xattr -cr Imagent.app

On Windows, bypass the SmartScreen warning by choosing More info → Run anyway.

Generate with defaults:

imagent image generate "minimal product photo of a ceramic mug" imagent video generate "a slow dolly shot through a rainy alley" imagent speech synthesize "Welcome to imagent, your local creative workspace."

Need setup details, provider configuration, desktop installation, or troubleshooting? Visit the documentation site:

https://unliftedq.github.io/imagent/docs

Agent skill integration

The repository includes a ready-to-install skill at skills/imagent. Install it into any compatible agent runtime, then make sure the imagent CLI is available on that agent's PATH.

npx skills add unliftedq/imagent

Use the same install flow for Claude Code, Codex, OpenClaw, Hermes, or other compatible agents. After installation, the agent can run imagent doctor to decide whether to use the shared local gallery and configured providers, or fall back to another generation tool when imagent is not configured.

Typical workflows

Give a coding or automation agent the ability to produce visual and audio assets mid-task, through one audited CLI.

Switch between providers and models for the same prompt without changing how you call them.

Build up a reusable library of characters, styles, and reference assets that compounds across projects.

Curate and revisit everything an agent generated, instead of losing it once the script exits.

Combine terminal automation with desktop-based review and curation over a shared local workspace.

Project structure

imagent/ apps/ desktop/ # @imagent/studio, Electron desktop application cli/ # @imagent/cli, command-line interface packages/ core/ # domain types, ports, and job runtime logic providers/ # provider adapters and model catalog persistence/ # SQLite, migrations, repositories, file and thumbnail handling config/ # configuration and secret management ipc/ # desktop IPC contract ui/ # shared UI components

Current status

imagent remains in an early stage. Data structures, packaging, and parts of the feature set may continue to evolve. The current version does not include telemetry, automatic updates, cloud sync, or account systems. Desktop packages are unsigned, so macOS may require removing quarantine and Windows may show a SmartScreen warning on first launch.

License

By contributing to imagent, you agree that your contributions will be licensed under the project's Apache License 2.0. You also confirm that you have the right to submit the work under that license.

Acknowledgements

Phosphor icons, Radix UI, Tailwind CSS v4, Bun, Turborepo, Vite, Electron, Commander, zod, zustand, better-sqlite3, sharp, ffmpeg-static, @dnd-kit.

About

Imagine + Agent

unliftedq.github.io/imagent/

Topics

image

video

generative-art

generative-ai

Resources

Readme

License

Apache-2.0 license

Contributing

Contributing

Uh oh!

There was an error while loading. Please reload this page.

Activity

Stars

8 stars

Watchers

0 watching

Forks

0 forks

Report repository

Releases 11

v0.3.1

Latest

Jun 13, 2026

+ 10 releases

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

TypeScript 96.5%

CSS 2.2%

JavaScript 1.2%

HTML 0.1%