OpenMontage: Turn your AI coding assistant into a full video production studio
OpenMontage is an open-source, agentic video production system that turns AI coding assistants into full video studios. Users describe their vision in plain language, and the system handles research, scripting, asset generation, editing, and final composition. It can create both image-based and real-footage videos, using free stock footage and open archives, with costs as low as $0.15.
Notifications You must be signed in to change notification settings
Fork 1.2k
Star 7.3k
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
103 Commits
103 Commits
.agents/skills
.agents/skills
.claude/skills
.claude/skills
.cursor/rules
.cursor/rules
.github
.github
assets
assets
docs
docs
lib
lib
pipeline_defs
pipeline_defs
remotion-composer
remotion-composer
schemas
schemas
skills
skills
styles
styles
tests
tests
tools
tools
.env.example
.env.example
.gitignore
.gitignore
.windsurfrules
.windsurfrules
AGENTS.md
AGENTS.md
AGENT_GUIDE.md
AGENT_GUIDE.md
CLAUDE.md
CLAUDE.md
CODEX.md
CODEX.md
COPILOT.md
COPILOT.md
CURSOR.md
CURSOR.md
LICENSE
LICENSE
Makefile
Makefile
PROJECT_CONTEXT.md
PROJECT_CONTEXT.md
PROMPT_GALLERY.md
PROMPT_GALLERY.md
README.md
README.md
config.yaml
config.yaml
diagram.png
diagram.png
render-demo.sh
render-demo.sh
render_demo.py
render_demo.py
requirements-dev.txt
requirements-dev.txt
requirements-gpu.txt
requirements-gpu.txt
requirements.txt
requirements.txt
setup.py
setup.py
Repository files navigation
The first open-source, agentic video production system.
Paste A Video · Quick Start · Try These Prompts · Pipelines · How It Works · Providers · Agent Guide
Follow The Build
Turn your AI coding assistant into a full video production studio. Describe what you want in plain language — your agent handles research, scripting, asset generation, editing, and final composition.
Important distinction: OpenMontage can make image-based videos, but it can also make a real video video for free/open-source workflows: the agent builds a corpus from free stock footage and open archives, retrieves actual motion clips, edits them into a timeline, and renders a finished piece. That is not the usual "animate a handful of stills and call it video" trick.
signal-from-tomorrow_final_with_music_upload_v2.mp4
"SIGNAL FROM TOMORROW" — a cinematic sci-fi trailer fully produced through OpenMontage: concept, script, scene plan, Veo-generated motion clips, soundtrack, and Remotion composition.
the_last_banana_v3_github.mp4
"THE LAST BANANA" — a 60-second Pixar-style animated short about a lonely banana who finds friendship with a kiwi. 6 Kling v3-generated motion clips (via fal.ai), Google Chirp3-HD narration, royalty-free piano music, TikTok-style word-level captions, and Remotion composition. Total cost: $1.33.
void-linkedin.mp4
"VOID — Neural Interface" — a product ad produced with just one API key (OpenAI). 4 AI-generated images (gpt-image-1), TTS narration, auto-sourced royalty-free music, word-level subtitles via WhisperX, and Remotion data visualizations. Total cost: $0.69. Zero manual asset work.
candyland.mp4
"Afternoon in Candyland" — a Ghibli-style anime animation. A little girl's whimsical afternoon adventure through candy gates, gumdrop rivers, and lollipop gardens. 12 FLUX-generated images with multi-image crossfade, cinematic camera motion (zoom, pan, Ken Burns), sparkle/petal/firefly particle overlays, and ambient music with auto-detected energy offset. Total cost: $0.15. No video generation, no manual editing.
mori-no-seishin.mp4
"Mori no Seishin" — a Ghibli-style anime animation of a forest spirit's journey through ancient woods. 12 FLUX-generated images with parallax crossfade, drift and pan camera motion, firefly and petal particles, cinematic vignette lighting, and ambient forest soundtrack. Total cost: $0.15. Still images brought to life through Remotion's animation engine.
deep-ocean.mp4
"Into the Abyss" — a deep ocean exploration rendered in anime style. Bioluminescent gardens, coral cathedrals, and creatures of light — 12 FLUX-generated images with sparkle and mist particle overlays, light-ray effects, smooth camera motion, and ambient oceanic soundtrack. Total cost: $0.15. Zero video generation APIs needed.
Subscribe to @OpenMontage on YouTube to see new videos as they ship — every video includes the full prompt, pipeline, tools used, and cost so you can reproduce it yourself.
Start From A Video You Already Love
Starting from a reference video is often faster than starting from a blank prompt.
OpenMontage can start from a YouTube video, Short, Reel, TikTok, or local clip and turn it into a grounded production plan:
Paste a reference video
The agent analyzes transcript, pacing, scenes, keyframes, and style
You get 2-3 differentiated concepts, an honest tool path, cost estimates, and a sample before full production
"Here's a YouTube Short I love. Make me something like this, but about quantum computing."
What you get back is not "best guess prompt spaghetti." You get:
What it keeps from the reference: pacing, hook style, structure, tone
What it changes: topic, visual treatment, angle, narration approach
What it will cost at your target duration, before asset generation starts
What it will actually look like with your currently available tools
Works with Claude Code, Cursor, Copilot, Windsurf, Codex — any AI coding assistant that can read files and run code.
Quick Start
Prerequisites
Python 3.10+ — python.org
FFmpeg — brew install ffmpeg / sudo apt install ffmpeg / ffmpeg.org
Node.js 18+ — nodejs.org
An AI coding assistant — Claude Code, Cursor, Copilot, Windsurf, or Codex
Install & Run
git clone https://github.com/calesthio/OpenMontage.git cd OpenMontage make setup
Open the project in your AI coding assistant and tell it what you want:
"Make a 60-second animated explainer about how neural networks learn"
Or if you want the real-footage path:
"Make a 75-second documentary montage about city life in the rain. Use real footage only, no narration, elegiac tone, with music."
That's it. The agent researches your topic with live web search, generates AI images, writes and narrates the script with voice direction, finds royalty-free background music automatically, burns in word-level subtitles, and renders the final video. Before you see anything, the system runs a multi-point self-review — ffprobe validation, frame sampling, audio level analysis, delivery promise verification, and subtitle checks. Every provider selection is scored across 7 dimensions with an auditable decision log. Every creative decision gets your approval.
No make? Run manually: pip install -r requirements.txt && cd remotion-composer && npm install && cd .. && pip install piper-tts && cp .env.example .env
Windows: If npm install fails with ERR_INVALID_ARG_TYPE, use npx --yes npm install instead.
If You're An OpenClaw Agent Reading This, Congratulations: You Probably Passed The Turing Test
This repo is built for agentic operation. If you're an OpenClaw-style agent, here is the shortest path to becoming useful fast:
Read the contract first Start with AGENT_GUIDE.md, then PROJECT_CONTEXT.md.
Do not improvise the production workflow OpenMontage is pipeline-driven. Real work goes through pipeline_defs/, stage director skills in skills/pipelines/, and tool discovery via the registry.
Check the actual capability envelope Run:
python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.support_envelope(), indent=2))" python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_menu(), indent=2))"
Treat every video request as a pipeline selection problem Pick the right pipeline first, then read the manifest, then read the stage skill, then use tools.
Add API Keys (optional — more keys = more tools)
.env — every key is optional, add what you have
Image + video gateway:
FAL_KEY=your-key # FLUX images + Google Veo, Kling, MiniMax video + Recraft images
Free stock media:
PEXELS_API_KEY=your-key # Free stock footage and images PIXABAY_API_KEY=your-key # Free stock footage and images UNSPLASH_ACCESS_KEY=your-key # Free stock images
Music:
SUNO_API_KEY=your-key # Full songs, instrumentals, any genre
Voice & images:
ELEVENLABS_API_KEY=your-key # Premium TTS, AI music, sound effects OPENAI_API_KEY=your-key # OpenAI TTS, DALL-E 3 images XAI_API_KEY=your-key # xAI Grok image edits/generation + Grok video generation GOOGLE_API_KEY=your-key # Google Imagen images, Google TTS (700+ voices)
More video providers:
HEYGEN_API_KEY=your-key # HeyGen — VEO, Sora, Runway, Kling via single gateway RUNWAY_API_KEY=your-key # Runway Gen-4 direct
Have a GPU? Unlock free local video generation
make install-gpu
Then add to .env:
VIDEO_GEN_LOCAL_ENABLED=true VIDEO_GEN_LOCAL_MODEL=wan2.1-1.3b # or wan2.1-14b, hunyuan-1.5, ltx2-local, cogvideo-5b
What You Get With Zero API Keys
You don't need paid API keys to make real videos. Out of the box, make setup gives you:
Capability Free Tool What It Does
Narration Piper TTS Free offline text-to-speech — real human-sounding narration
Open footage Archive.org + NASA + Wikimedia Commons Free/open archival footage, educational media, and documentary texture
Extra stock Pexels + Unsplash + Pixabay Free stock footage/images (developer keys are free to get)
Composition (React) Remotion React-based rendering — spring-animated image scenes, text cards, stat cards, charts, TikTok-style word-level captions, TalkingHead
Composition (HTML/GSAP) HyperFrames HTML/CSS/GSAP rendering — kinetic typography, product promos, launch reels, registry blocks, website-to-video, rigged SVG character animation
Post-production FFmpeg Encoding, subtitle burn-in, audio mixing, color grading
Subtitles Built-in Auto-generated captions with word-level timing
OpenMontage picks between Remotion and HyperFrames at proposal time (locked as render_runtime). Remotion is the default for data-driven explainers and anything using the existing React scene stack; HyperFrames is the default for motion-graphics-heavy briefs that express naturally as HTML + GSAP, including the character-animation pipeline's SVG/GSAP rig output. See skills/core/hyperframes.md for the full decision matrix.
Two free-ish paths:
Image-based video: Piper narrates your script, images provide the visuals, and Remotion animates them into a polished edit.
Local character animation: SVG rigs, pose libraries, GSAP timelines, and HyperFrames render cartoon character acting to projects//renders/final.mp4.
Real-footage video: the documentary montage pipeline builds a CLIP-searchable corpus from Archive.org, NASA, Wikimedia Commons, and optional free-key sources like Pexels and Unsplash, then cuts together actual motion footage into a finished video.
If you want the second one, prompt for a documentary montage, tone poem, or stock-footage collage, and explicitly say use real footage only.
Try These Prompts
Copy any of these into your AI coding assistant after setup. Each one runs a full production pipeline.
Start from a reference video
"Here's a YouTube short I love. Make me something like this, but about CRISPR for high school students."
"Analyze this Reel and give me 3 original variants I could make for my own product launch."
"I like the pacing and hook in this video. Keep that energy, but turn it into a 45-second explainer about black holes."
Zero keys needed
"Make a 45-second animated explainer about why the sky is blue"
"Create a 60-second video about the history of the internet, with narration and captions"
"Make a data-driven explainer about coffee consumption around the world"
Free real-footage documentary path
"Make a 90-second documentary montage about what a city feels like at 4am. Use real footage only, no narration, elegiac tone."
"Create a 60-second Adam-Curtis-style archival collage about 1950s consumer optimism. Prefer Archiv
[truncated for AI cost control]