AI News HubLIVE
In-site rewrite5 min read

OpenMontage: Turn your AI coding assistant into a full video production studio

OpenMontage is an open-source, agentic video production system that turns AI coding assistants into full video studios. Users describe their vision in plain language, and the system handles research, scripting, asset generation, editing, and final composition. It can create both image-based and real-footage videos, using free stock footage and open archives, with costs as low as $0.15.

SourceHacker News AIAuthor: vantareed

Notifications You must be signed in to change notification settings

Fork 1.2k

Star 7.3k

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

103 Commits

103 Commits

.agents/skills

.agents/skills

.claude/skills

.claude/skills

.cursor/rules

.cursor/rules

.github

.github

assets

assets

docs

docs

lib

lib

pipeline_defs

pipeline_defs

remotion-composer

remotion-composer

schemas

schemas

skills

skills

styles

styles

tests

tests

tools

tools

.env.example

.env.example

.gitignore

.gitignore

.windsurfrules

.windsurfrules

AGENTS.md

AGENTS.md

AGENT_GUIDE.md

AGENT_GUIDE.md

CLAUDE.md

CLAUDE.md

CODEX.md

CODEX.md

COPILOT.md

COPILOT.md

CURSOR.md

CURSOR.md

LICENSE

LICENSE

Makefile

Makefile

PROJECT_CONTEXT.md

PROJECT_CONTEXT.md

PROMPT_GALLERY.md

PROMPT_GALLERY.md

README.md

README.md

config.yaml

config.yaml

diagram.png

diagram.png

render-demo.sh

render-demo.sh

render_demo.py

render_demo.py

requirements-dev.txt

requirements-dev.txt

requirements-gpu.txt

requirements-gpu.txt

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

The first open-source, agentic video production system.

Paste A Video  · Quick Start  · Try These Prompts  · Pipelines  · How It Works  · Providers  · Agent Guide

Follow The Build

Turn your AI coding assistant into a full video production studio. Describe what you want in plain language — your agent handles research, scripting, asset generation, editing, and final composition.

Important distinction: OpenMontage can make image-based videos, but it can also make a real video video for free/open-source workflows: the agent builds a corpus from free stock footage and open archives, retrieves actual motion clips, edits them into a timeline, and renders a finished piece. That is not the usual "animate a handful of stills and call it video" trick.

signal-from-tomorrow_final_with_music_upload_v2.mp4

"SIGNAL FROM TOMORROW" — a cinematic sci-fi trailer fully produced through OpenMontage: concept, script, scene plan, Veo-generated motion clips, soundtrack, and Remotion composition.

the_last_banana_v3_github.mp4

"THE LAST BANANA" — a 60-second Pixar-style animated short about a lonely banana who finds friendship with a kiwi. 6 Kling v3-generated motion clips (via fal.ai), Google Chirp3-HD narration, royalty-free piano music, TikTok-style word-level captions, and Remotion composition. Total cost: $1.33.

void-linkedin.mp4

"VOID — Neural Interface" — a product ad produced with just one API key (OpenAI). 4 AI-generated images (gpt-image-1), TTS narration, auto-sourced royalty-free music, word-level subtitles via WhisperX, and Remotion data visualizations. Total cost: $0.69. Zero manual asset work.

candyland.mp4

"Afternoon in Candyland" — a Ghibli-style anime animation. A little girl's whimsical afternoon adventure through candy gates, gumdrop rivers, and lollipop gardens. 12 FLUX-generated images with multi-image crossfade, cinematic camera motion (zoom, pan, Ken Burns), sparkle/petal/firefly particle overlays, and ambient music with auto-detected energy offset. Total cost: $0.15. No video generation, no manual editing.

mori-no-seishin.mp4

"Mori no Seishin" — a Ghibli-style anime animation of a forest spirit's journey through ancient woods. 12 FLUX-generated images with parallax crossfade, drift and pan camera motion, firefly and petal particles, cinematic vignette lighting, and ambient forest soundtrack. Total cost: $0.15. Still images brought to life through Remotion's animation engine.

deep-ocean.mp4

"Into the Abyss" — a deep ocean exploration rendered in anime style. Bioluminescent gardens, coral cathedrals, and creatures of light — 12 FLUX-generated images with sparkle and mist particle overlays, light-ray effects, smooth camera motion, and ambient oceanic soundtrack. Total cost: $0.15. Zero video generation APIs needed.

Subscribe to @OpenMontage on YouTube to see new videos as they ship — every video includes the full prompt, pipeline, tools used, and cost so you can reproduce it yourself.

Start From A Video You Already Love

Starting from a reference video is often faster than starting from a blank prompt.

OpenMontage can start from a YouTube video, Short, Reel, TikTok, or local clip and turn it into a grounded production plan:

Paste a reference video

The agent analyzes transcript, pacing, scenes, keyframes, and style

You get 2-3 differentiated concepts, an honest tool path, cost estimates, and a sample before full production

"Here's a YouTube Short I love. Make me something like this, but about quantum computing."

What you get back is not "best guess prompt spaghetti." You get:

What it keeps from the reference: pacing, hook style, structure, tone

What it changes: topic, visual treatment, angle, narration approach

What it will cost at your target duration, before asset generation starts

What it will actually look like with your currently available tools

Works with Claude Code, Cursor, Copilot, Windsurf, Codex — any AI coding assistant that can read files and run code.

Quick Start

Prerequisites

Python 3.10+ — python.org

FFmpeg — brew install ffmpeg / sudo apt install ffmpeg / ffmpeg.org

Node.js 18+ — nodejs.org

An AI coding assistant — Claude Code, Cursor, Copilot, Windsurf, or Codex

Install & Run

git clone https://github.com/calesthio/OpenMontage.git cd OpenMontage make setup

Open the project in your AI coding assistant and tell it what you want:

"Make a 60-second animated explainer about how neural networks learn"

Or if you want the real-footage path:

"Make a 75-second documentary montage about city life in the rain. Use real footage only, no narration, elegiac tone, with music."

That's it. The agent researches your topic with live web search, generates AI images, writes and narrates the script with voice direction, finds royalty-free background music automatically, burns in word-level subtitles, and renders the final video. Before you see anything, the system runs a multi-point self-review — ffprobe validation, frame sampling, audio level analysis, delivery promise verification, and subtitle checks. Every provider selection is scored across 7 dimensions with an auditable decision log. Every creative decision gets your approval.

No make? Run manually: pip install -r requirements.txt && cd remotion-composer && npm install && cd .. && pip install piper-tts && cp .env.example .env

Windows: If npm install fails with ERR_INVALID_ARG_TYPE, use npx --yes npm install instead.

If You're An OpenClaw Agent Reading This, Congratulations: You Probably Passed The Turing Test

This repo is built for agentic operation. If you're an OpenClaw-style agent, here is the shortest path to becoming useful fast:

Read the contract first Start with AGENT_GUIDE.md, then PROJECT_CONTEXT.md.

Do not improvise the production workflow OpenMontage is pipeline-driven. Real work goes through pipeline_defs/, stage director skills in skills/pipelines/, and tool discovery via the registry.

Check the actual capability envelope Run:

python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.support_envelope(), indent=2))" python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_menu(), indent=2))"

Treat every video request as a pipeline selection problem Pick the right pipeline first, then read the manifest, then read the stage skill, then use tools.

Add API Keys (optional — more keys = more tools)

.env — every key is optional, add what you have

Image + video gateway:

FAL_KEY=your-key # FLUX images + Google Veo, Kling, MiniMax video + Recraft images

Free stock media:

PEXELS_API_KEY=your-key # Free stock footage and images PIXABAY_API_KEY=your-key # Free stock footage and images UNSPLASH_ACCESS_KEY=your-key # Free stock images

Music:

SUNO_API_KEY=your-key # Full songs, instrumentals, any genre

Voice & images:

ELEVENLABS_API_KEY=your-key # Premium TTS, AI music, sound effects OPENAI_API_KEY=your-key # OpenAI TTS, DALL-E 3 images XAI_API_KEY=your-key # xAI Grok image edits/generation + Grok video generation GOOGLE_API_KEY=your-key # Google Imagen images, Google TTS (700+ voices)

More video providers:

HEYGEN_API_KEY=your-key # HeyGen — VEO, Sora, Runway, Kling via single gateway RUNWAY_API_KEY=your-key # Runway Gen-4 direct

Have a GPU? Unlock free local video generation

make install-gpu

Then add to .env:

VIDEO_GEN_LOCAL_ENABLED=true VIDEO_GEN_LOCAL_MODEL=wan2.1-1.3b # or wan2.1-14b, hunyuan-1.5, ltx2-local, cogvideo-5b

What You Get With Zero API Keys

You don't need paid API keys to make real videos. Out of the box, make setup gives you:

Capability Free Tool What It Does

Narration Piper TTS Free offline text-to-speech — real human-sounding narration

Open footage Archive.org + NASA + Wikimedia Commons Free/open archival footage, educational media, and documentary texture

Extra stock Pexels + Unsplash + Pixabay Free stock footage/images (developer keys are free to get)

Composition (React) Remotion React-based rendering — spring-animated image scenes, text cards, stat cards, charts, TikTok-style word-level captions, TalkingHead

Composition (HTML/GSAP) HyperFrames HTML/CSS/GSAP rendering — kinetic typography, product promos, launch reels, registry blocks, website-to-video, rigged SVG character animation

Post-production FFmpeg Encoding, subtitle burn-in, audio mixing, color grading

Subtitles Built-in Auto-generated captions with word-level timing

OpenMontage picks between Remotion and HyperFrames at proposal time (locked as render_runtime). Remotion is the default for data-driven explainers and anything using the existing React scene stack; HyperFrames is the default for motion-graphics-heavy briefs that express naturally as HTML + GSAP, including the character-animation pipeline's SVG/GSAP rig output. See skills/core/hyperframes.md for the full decision matrix.

Two free-ish paths:

Image-based video: Piper narrates your script, images provide the visuals, and Remotion animates them into a polished edit.

Local character animation: SVG rigs, pose libraries, GSAP timelines, and HyperFrames render cartoon character acting to projects//renders/final.mp4.

Real-footage video: the documentary montage pipeline builds a CLIP-searchable corpus from Archive.org, NASA, Wikimedia Commons, and optional free-key sources like Pexels and Unsplash, then cuts together actual motion footage into a finished video.

If you want the second one, prompt for a documentary montage, tone poem, or stock-footage collage, and explicitly say use real footage only.

Try These Prompts

Copy any of these into your AI coding assistant after setup. Each one runs a full production pipeline.

Start from a reference video

"Here's a YouTube short I love. Make me something like this, but about CRISPR for high school students."

"Analyze this Reel and give me 3 original variants I could make for my own product launch."

"I like the pacing and hook in this video. Keep that energy, but turn it into a 45-second explainer about black holes."

Zero keys needed

"Make a 45-second animated explainer about why the sky is blue"

"Create a 60-second video about the history of the internet, with narration and captions"

"Make a data-driven explainer about coffee consumption around the world"

Free real-footage documentary path

"Make a 90-second documentary montage about what a city feels like at 4am. Use real footage only, no narration, elegiac tone."

"Create a 60-second Adam-Curtis-style archival collage about 1950s consumer optimism. Prefer Archiv

[truncated for AI cost control]