Show HN: Vivijure – Self-hosted AI film studio on your own GPU (AGPL)
Vivijure is a self-hosted AI film studio that runs on Cloudflare Workers and your own GPU. It enables you to create films from storyboards with features like keyframe generation, character LoRAs, music scoring, TTS narration, and lip-sync dialogue. You own all artifacts and can use various motion backends.
Notifications You must be signed in to change notification settings
Fork 0
Star 1
BranchesTags
Open more actions menu
Folders and files
NameName
Last commit message
Last commit date
Latest commit
History
252 Commits
252 Commits
.github/workflows
.github/workflows
containers
containers
deploy
deploy
docs
docs
migrations
migrations
modules
modules
public
public
scripts
scripts
src
src
tests
tests
.gitignore
.gitignore
CHANGELOG.md
CHANGELOG.md
CLAUDE.md
CLAUDE.md
LICENSE
LICENSE
README.md
README.md
package-lock.json
package-lock.json
package.json
package.json
tsconfig.json
tsconfig.json
vitest.config.ts
vitest.config.ts
wrangler.toml
wrangler.toml
Repository files navigation
Write a storyboard. Render it to video on your own GPU. No subscription, no account wall, no lock-in. You bring the GPU and the keys; the studio brings the pipeline.
Vivijure is a self-hosted AI film studio built on Cloudflare Workers. It runs free on the Workers free tier and connects to whatever GPU backend you attach -- RunPod, your own box, or a cloud motion API. You own every artifact.
Showcase: four films -- silent, scored, narrated, and now talking
Four real films rendered end to end on Vivijure, unedited renders straight off the pipeline: a silent picture, one scored with a generated music bed, and one narrated with TTS, with motion across own-GPU Wan, Seedance cloud, and Kling cloud backends. The newest, Vivijure Speaks, adds a character lip-synced to its own dialogue, on a self-hosted GPU.
NEON HALFLIFE -- silent (own-GPU Wan i2v)
NEON HALFLIFE: the first film rendered end to end on Vivijure. 1080p, ten shots, 30 seconds. Motion on a self-hosted GPU (the own-gpu Wan I2V backend). Click the frame above to play, or download the MP4 (29 MB).
This clip is silent on purpose. Vivijure assembles a silent picture by default; scoring (a music bed, TTS narration, beat-synced cuts) is an opt-in Audio step you run after the picture locks. This is the picture straight off the pipeline, before any audio pass.
What makes it the proof and not just a demo: this was the first unattended full run, and it came out clean. Zero clips dropped (ten of ten shots rendered). It also recovered itself: the finish phase stalled partway through, the orchestrator re-adopted the in-flight work, and the film finished, all of it across a session restart with nobody watching. The system healing its own stall, unattended, is the part we are actually proud of.
FUR AND CIRCUITS -- scored, music bed (Seedance cloud i2v)
FUR AND CIRCUITS: eight shots, scored with a generated music bed (MiniMax Music module). Motion on Seedance cloud i2v; two character LoRAs trained from cast portraits. Click the frame above to play, or download the MP4 (43 MB).
The scored mode: the picture locks and the Audio step attaches a generated music bed, beat-synced to the edit. The music is generated, not licensed -- produced by the MiniMax Music module, staged to R2, and muxed into the final MP4. The whole pipeline, including scoring, ran unattended.
RUST -- narrated, TTS (Kling cloud i2v)
RUST: three shots, narrated with TTS (MiniMax Speech module). Motion on Kling cloud i2v; two character LoRAs (Salvage Robot and Companion Robot). Click the frame above to play, or download the MP4 (33 MB).
The narrated mode: TTS reads the script over the cut, no music bed. Generated by the MiniMax Speech module directly from the storyboard text, staged to R2, and muxed into the final MP4. Narration is a drop-in alternative to the music bed in the same scoring chain.
Vivijure Speaks -- talking, lip-sync + upscale (own-GPU Wan i2v)
Vivijure Speaks: two shots, about two and a half seconds, 1080p. A talking character lip-synced to its own dialogue and upscaled (per-shot dialogue TTS, then the MuseTalk lip-sync module and a CUDA Real-ESRGAN pass over an interpolated clip). Motion on a self-hosted GPU (the own-gpu Wan I2V backend). Click the frame above to play, or download the MP4.
The talking mode: per shot, a generated line of dialogue is muxed into the clip and MuseTalk drives the character's mouth to match it. It came out silent the first time; a from-scratch re-fire then surfaced two more orchestration bugs (a backend phantom-keyframe and a finish-step wedge) before any user could hit them. The honest writeup tells the three-fix story.
Ecosystem
slate --> vivijure --> vivijure-backend
Repo Role
slate Collaborative AI screenwriter Discord bot -- shapes the film in-channel, then hands it to vivijure to render
vivijure AI film studio control plane (Cloudflare Worker) -- planner, cast, render UI; orchestrates render jobs
vivijure-backend GPU render backend (RunPod serverless) -- SDXL keyframes, i2v, finish, assemble
Storyboard planner -- write scenes, edit shot prompts, and set per-shot cast assignments before bundling:
Cast -- register characters with portraits and visual bibles; Slate syncs here directly from Discord:
Module host -- installed modules appear here; each stage (plan, cast, keyframe, motion, finish, score) is served by a swappable module worker:
Render history -- honest per-render status. The panel surfaces real failed attempts alongside completed renders (here, three failed runs and one completed), with inline error snippets; it shows what actually happened, not a curated success:
What you can do
Write a storyboard -- scenes, shot descriptions, character beats -- in the planner.
Generate SDXL keyframes per shot on your GPU (preview before committing to full motion).
Animate each shot with Wan 2.2 I2V on your own GPU, or any of six cloud motion backends (Kling, Seedance, MiniMax Hailuo, Google Veo, Vidu Q3, Wan 2.6) -- seven in all, mix and match per shot, any aspect ratio.
Cast characters -- upload portraits, generate LoRA training sets, train a character LoRA on your GPU so your cast looks consistent across shots.
Score the film -- attach a music bed, narrate it with TTS, or beat-sync cuts.
Give characters a voice -- generate per-shot dialogue, lip-sync it with MuseTalk, and upscale the result with CUDA Real-ESRGAN, as opt-in finish modules over the same motion path.
Download the assembled silent MP4 or mux in audio without touching the GPU at all.
Everything beyond keyframes uses your own R2 bucket for artifacts; you are never renting storage from us.
Why not just use a SaaS?
Because you run Proxmox. Because you have a V100 or an H100 and you do not want to pay $0.80 a second to someone else's GPU. Because you want to swap the motion model, adjust the sampler, and not file a support ticket to do it.
Vivijure is for the creative homelabber who is priced out of subscription AI video tools and prefers to own the stack. The control plane is on Cloudflare's free tier (no server to run); the GPU work hits whatever endpoint you point it at; the artifacts land in your R2 bucket.
Quick start
1. Clone and install
git clone https://github.com/skyphusion-labs/vivijure cd vivijure npm install
2. Configure
Edit wrangler.toml: add your R2 bucket, D1 database, and module service bindings.
Set secrets (RunPod key, CF Access token for R2, AI Gateway) via wrangler secret put.
3. Develop locally
npm run dev # wrangler dev -- hot reload at localhost:8787
4. Deploy
npm run deploy # wrangler deploy
See CLAUDE.md for conventions and docs/module-authoring.md for how to write your own module worker.
Architecture
Vivijure is a module host, not a monolith. The core worker owns what is always true -- project, storyboard, cast, bundle assembly, render orchestration, and a module registry. Every capability beyond that is an opt-in module worker plugged into the pipeline through a typed hook contract.
Install only the modules you want. The studio UI assembles itself from GET /api/modules -- it never hardcodes a feature section. Install none and you get a clean, empty studio.
core (this worker) |-- keyframe hook --> your SDXL keyframe module (GPU) |-- motion.backend --> GPU i2v module OR cloud motion module (per shot) |-- finish --> interpolation / upscale / lip-sync (optional chain) |-- score --> music / narration / beat-sync (optional chain) |-- plan.enhance --> LLM auto-direction before render (optional) |-- cast.image --> portrait -> LoRA training set (optional) '-- notify --> render-done email / webhook (optional)
The module contract is vivijure-module/1 in src/modules/types.ts. A module is a Cloudflare Worker that serves GET /module.json (manifest) and POST /invoke (run a hook). That is the whole interface; a module in another language, on another platform, works fine as long as it speaks JSON over HTTP.
See docs/module-api.md for the full contract and docs/module-authoring.md for the step-by-step guide.
The GPU render backend is vivijure-backend (RunPod serverless, SDXL + Wan I2V + ffmpeg assemble). The studio UI lives at vivijure.skyphusion.org (/planner, /cast, /modules).
How a render flows
The path from a storyboard to a finished film.mp4. The keyframe fans into both the dialogue and the motion backend; any of seven motion backends (own-GPU or cloud) renders the clip; the opt-in finish chain interpolates, lip-syncs, and upscales it; then the shots gather, assemble, and mux. Drawn out, it is a real studio pipeline, not a wrapper.
flowchart LR SB([Storyboard]) --> KF[Keyframe SDXL on GPU] KF --> DLG[Dialogue per-shot TTS] KF --> MB{motion.backend} MB -->|own-gpu| WAN[Wan i2v your GPU] MB -->|cloud| CLD[Kling / Wan 2.6 Seedance / Hailuo Veo / Vidu] WAN --> RIFE CLD --> RIFE subgraph FIN [finish chain, opt-in] RIFE[RIFE interpolate] --> LS[MuseTalk lip-sync] --> UP[CUDA Real-ESRGAN upscale] --> OV[text overlay] end DLG --> LS OV --> ASM[Gather + assemble keepClipAudio] ASM --> MUX[Mux audio] --> FILM[(film.mp4)]
Loading
Motion is backend-agnostic: the same keyframe feeds own-GPU Wan or any cloud i2v module, and the finish chain runs the same way over whatever clip comes back. The dialogue track is generated per shot, drives the lip-sync, and rides through assembly into the final mux.
Develop
npm run typecheck # tsc --noEmit (CI gate -- run before pushing) npm test # vitest npm run dev # wrangler dev npm run deploy # wrangler deploy
account_id comes from CLOUDFLARE_ACCOUNT_ID in the environment, not hardcoded. All bindings are in wrangler.toml (committed); secrets go in via wrangler secret put.
License
AGPL-3.0. Free as in yours.
About
Vivijure Studio: a module host for AI film production (AGPL). Thin Cloudflare Worker core + opt-in module workers behind a typed hook contract.
Topics
storyboard
self-hosted
agpl
filmmaking
homelab
video-generation
image-to-video
cloudflare-workers
text-to-video
llm
runpod
stable-diffusion
generative-ai
ai-video
Resources
Readme
License
AGPL-3.0 license
Uh oh!
There was an error while loading. Please reload this page.
Activity
Custom properties
Stars
1 star
Watchers
0 watching
Forks
0 forks
Report repository
Releases
6 tags
Packages 0
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
TypeScript 68.4%
JavaScript 19.2%
Python 5.3%
HTML 3.9%
CSS 2.6%
Dockerfile 0.5%
Shell 0.1%