AI News HubLIVE
In-site rewrite6 min read

Show HN: Vivijure – Self-hosted AI film studio on your own GPU (AGPL)

Vivijure is a self-hosted AI film studio that runs on Cloudflare Workers and your own GPU. It enables you to create films from storyboards with features like keyframe generation, character LoRAs, music scoring, TTS narration, and lip-sync dialogue. You own all artifacts and can use various motion backends.

SourceHacker News AIAuthor: skyphusion

Notifications You must be signed in to change notification settings

Fork 0

Star 1

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

252 Commits

252 Commits

.github/workflows

.github/workflows

containers

containers

deploy

deploy

docs

docs

migrations

migrations

modules

modules

public

public

scripts

scripts

src

src

tests

tests

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

CLAUDE.md

CLAUDE.md

LICENSE

LICENSE

README.md

README.md

package-lock.json

package-lock.json

package.json

package.json

tsconfig.json

tsconfig.json

vitest.config.ts

vitest.config.ts

wrangler.toml

wrangler.toml

Repository files navigation

Write a storyboard. Render it to video on your own GPU. No subscription, no account wall, no lock-in. You bring the GPU and the keys; the studio brings the pipeline.

Vivijure is a self-hosted AI film studio built on Cloudflare Workers. It runs free on the Workers free tier and connects to whatever GPU backend you attach -- RunPod, your own box, or a cloud motion API. You own every artifact.

Showcase: four films -- silent, scored, narrated, and now talking

Four real films rendered end to end on Vivijure, unedited renders straight off the pipeline: a silent picture, one scored with a generated music bed, and one narrated with TTS, with motion across own-GPU Wan, Seedance cloud, and Kling cloud backends. The newest, Vivijure Speaks, adds a character lip-synced to its own dialogue, on a self-hosted GPU.

NEON HALFLIFE -- silent (own-GPU Wan i2v)

NEON HALFLIFE: the first film rendered end to end on Vivijure. 1080p, ten shots, 30 seconds. Motion on a self-hosted GPU (the own-gpu Wan I2V backend). Click the frame above to play, or download the MP4 (29 MB).

This clip is silent on purpose. Vivijure assembles a silent picture by default; scoring (a music bed, TTS narration, beat-synced cuts) is an opt-in Audio step you run after the picture locks. This is the picture straight off the pipeline, before any audio pass.

What makes it the proof and not just a demo: this was the first unattended full run, and it came out clean. Zero clips dropped (ten of ten shots rendered). It also recovered itself: the finish phase stalled partway through, the orchestrator re-adopted the in-flight work, and the film finished, all of it across a session restart with nobody watching. The system healing its own stall, unattended, is the part we are actually proud of.

FUR AND CIRCUITS -- scored, music bed (Seedance cloud i2v)

FUR AND CIRCUITS: eight shots, scored with a generated music bed (MiniMax Music module). Motion on Seedance cloud i2v; two character LoRAs trained from cast portraits. Click the frame above to play, or download the MP4 (43 MB).

The scored mode: the picture locks and the Audio step attaches a generated music bed, beat-synced to the edit. The music is generated, not licensed -- produced by the MiniMax Music module, staged to R2, and muxed into the final MP4. The whole pipeline, including scoring, ran unattended.

RUST -- narrated, TTS (Kling cloud i2v)

RUST: three shots, narrated with TTS (MiniMax Speech module). Motion on Kling cloud i2v; two character LoRAs (Salvage Robot and Companion Robot). Click the frame above to play, or download the MP4 (33 MB).

The narrated mode: TTS reads the script over the cut, no music bed. Generated by the MiniMax Speech module directly from the storyboard text, staged to R2, and muxed into the final MP4. Narration is a drop-in alternative to the music bed in the same scoring chain.

Vivijure Speaks -- talking, lip-sync + upscale (own-GPU Wan i2v)

Vivijure Speaks: two shots, about two and a half seconds, 1080p. A talking character lip-synced to its own dialogue and upscaled (per-shot dialogue TTS, then the MuseTalk lip-sync module and a CUDA Real-ESRGAN pass over an interpolated clip). Motion on a self-hosted GPU (the own-gpu Wan I2V backend). Click the frame above to play, or download the MP4.

The talking mode: per shot, a generated line of dialogue is muxed into the clip and MuseTalk drives the character's mouth to match it. It came out silent the first time; a from-scratch re-fire then surfaced two more orchestration bugs (a backend phantom-keyframe and a finish-step wedge) before any user could hit them. The honest writeup tells the three-fix story.

Ecosystem

slate --> vivijure --> vivijure-backend

Repo Role

slate Collaborative AI screenwriter Discord bot -- shapes the film in-channel, then hands it to vivijure to render

vivijure AI film studio control plane (Cloudflare Worker) -- planner, cast, render UI; orchestrates render jobs

vivijure-backend GPU render backend (RunPod serverless) -- SDXL keyframes, i2v, finish, assemble

Storyboard planner -- write scenes, edit shot prompts, and set per-shot cast assignments before bundling:

Cast -- register characters with portraits and visual bibles; Slate syncs here directly from Discord:

Module host -- installed modules appear here; each stage (plan, cast, keyframe, motion, finish, score) is served by a swappable module worker:

Render history -- honest per-render status. The panel surfaces real failed attempts alongside completed renders (here, three failed runs and one completed), with inline error snippets; it shows what actually happened, not a curated success:

What you can do

Write a storyboard -- scenes, shot descriptions, character beats -- in the planner.

Generate SDXL keyframes per shot on your GPU (preview before committing to full motion).

Animate each shot with Wan 2.2 I2V on your own GPU, or any of six cloud motion backends (Kling, Seedance, MiniMax Hailuo, Google Veo, Vidu Q3, Wan 2.6) -- seven in all, mix and match per shot, any aspect ratio.

Cast characters -- upload portraits, generate LoRA training sets, train a character LoRA on your GPU so your cast looks consistent across shots.

Score the film -- attach a music bed, narrate it with TTS, or beat-sync cuts.

Give characters a voice -- generate per-shot dialogue, lip-sync it with MuseTalk, and upscale the result with CUDA Real-ESRGAN, as opt-in finish modules over the same motion path.

Download the assembled silent MP4 or mux in audio without touching the GPU at all.

Everything beyond keyframes uses your own R2 bucket for artifacts; you are never renting storage from us.

Why not just use a SaaS?

Because you run Proxmox. Because you have a V100 or an H100 and you do not want to pay $0.80 a second to someone else's GPU. Because you want to swap the motion model, adjust the sampler, and not file a support ticket to do it.

Vivijure is for the creative homelabber who is priced out of subscription AI video tools and prefers to own the stack. The control plane is on Cloudflare's free tier (no server to run); the GPU work hits whatever endpoint you point it at; the artifacts land in your R2 bucket.

Quick start

1. Clone and install

git clone https://github.com/skyphusion-labs/vivijure cd vivijure npm install

2. Configure

Edit wrangler.toml: add your R2 bucket, D1 database, and module service bindings.

Set secrets (RunPod key, CF Access token for R2, AI Gateway) via wrangler secret put.

3. Develop locally

npm run dev # wrangler dev -- hot reload at localhost:8787

4. Deploy

npm run deploy # wrangler deploy

See CLAUDE.md for conventions and docs/module-authoring.md for how to write your own module worker.

Architecture

Vivijure is a module host, not a monolith. The core worker owns what is always true -- project, storyboard, cast, bundle assembly, render orchestration, and a module registry. Every capability beyond that is an opt-in module worker plugged into the pipeline through a typed hook contract.

Install only the modules you want. The studio UI assembles itself from GET /api/modules -- it never hardcodes a feature section. Install none and you get a clean, empty studio.

core (this worker) |-- keyframe hook --> your SDXL keyframe module (GPU) |-- motion.backend --> GPU i2v module OR cloud motion module (per shot) |-- finish --> interpolation / upscale / lip-sync (optional chain) |-- score --> music / narration / beat-sync (optional chain) |-- plan.enhance --> LLM auto-direction before render (optional) |-- cast.image --> portrait -> LoRA training set (optional) '-- notify --> render-done email / webhook (optional)

The module contract is vivijure-module/1 in src/modules/types.ts. A module is a Cloudflare Worker that serves GET /module.json (manifest) and POST /invoke (run a hook). That is the whole interface; a module in another language, on another platform, works fine as long as it speaks JSON over HTTP.

See docs/module-api.md for the full contract and docs/module-authoring.md for the step-by-step guide.

The GPU render backend is vivijure-backend (RunPod serverless, SDXL + Wan I2V + ffmpeg assemble). The studio UI lives at vivijure.skyphusion.org (/planner, /cast, /modules).

How a render flows

The path from a storyboard to a finished film.mp4. The keyframe fans into both the dialogue and the motion backend; any of seven motion backends (own-GPU or cloud) renders the clip; the opt-in finish chain interpolates, lip-syncs, and upscales it; then the shots gather, assemble, and mux. Drawn out, it is a real studio pipeline, not a wrapper.

flowchart LR SB([Storyboard]) --> KF[Keyframe SDXL on GPU] KF --> DLG[Dialogue per-shot TTS] KF --> MB{motion.backend} MB -->|own-gpu| WAN[Wan i2v your GPU] MB -->|cloud| CLD[Kling / Wan 2.6 Seedance / Hailuo Veo / Vidu] WAN --> RIFE CLD --> RIFE subgraph FIN [finish chain, opt-in] RIFE[RIFE interpolate] --> LS[MuseTalk lip-sync] --> UP[CUDA Real-ESRGAN upscale] --> OV[text overlay] end DLG --> LS OV --> ASM[Gather + assemble keepClipAudio] ASM --> MUX[Mux audio] --> FILM[(film.mp4)]

Loading

Motion is backend-agnostic: the same keyframe feeds own-GPU Wan or any cloud i2v module, and the finish chain runs the same way over whatever clip comes back. The dialogue track is generated per shot, drives the lip-sync, and rides through assembly into the final mux.

Develop

npm run typecheck # tsc --noEmit (CI gate -- run before pushing) npm test # vitest npm run dev # wrangler dev npm run deploy # wrangler deploy

account_id comes from CLOUDFLARE_ACCOUNT_ID in the environment, not hardcoded. All bindings are in wrangler.toml (committed); secrets go in via wrangler secret put.

License

AGPL-3.0. Free as in yours.

About

Vivijure Studio: a module host for AI film production (AGPL). Thin Cloudflare Worker core + opt-in module workers behind a typed hook contract.

Topics

storyboard

self-hosted

agpl

filmmaking

homelab

video-generation

image-to-video

cloudflare-workers

text-to-video

llm

runpod

stable-diffusion

generative-ai

ai-video

Resources

Readme

License

AGPL-3.0 license

Uh oh!

There was an error while loading. Please reload this page.

Activity

Custom properties

Stars

1 star

Watchers

0 watching

Forks

0 forks

Report repository

Releases

6 tags

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

TypeScript 68.4%

JavaScript 19.2%

Python 5.3%

HTML 3.9%

CSS 2.6%

Dockerfile 0.5%

Shell 0.1%