Show HN: Sipp – Run small local LLMs in browser 3x faster
Sipp is a new open-source WebGPU runtime that runs small LLMs in the browser up to 3x faster than alternatives, with zero install and a unified API for local and cloud inference.
Blazing-fast WebGPU runtime · Open Source
AI inference,
The fastest runtime for the web. Run models right in the browser, zero install, and zero dependencies. Build games, agents, vision, and chat. Add a secure cloud gateway when you need it, all from one client.
Get Started →
Open sourceZero installLocal + gateway
Now withCUDA · VULKAN · METAL
SippPink Lemonade Ed.
Net wt. 1 SDK
100% real inference
Nutrition Facts
Serving size: 1 Sipp Client
Amount per pour
Dependencies0
Cold start< 1000ms
EngineRust · C++ · GGML
BackendBrowser-nativeWebGPU
% Dev Value
Open source100%
Type-safe100%
Framework sludge0%
$ npm install @sipphq/sipp
Runs in your browserFastest WebGPU runtimeBuild gamesAgents & botsVision & chatZero installLocal + gatewayFully Open Source
Runs in your browserFastest WebGPU runtimeBuild gamesAgents & botsVision & chatZero installLocal + gatewayFully Open Source
Ingredients
WebGPU · WASM · Rust · C++ · GGUF · TypeScript · 100% real tokens. Contains no frameworks, no concentrate, no added sugar.
Taste test
Easy Setup. One Simple API.
Manage and query multiple inference endpoints through a single, unified API. Switch or split traffic between local browser execution and cloud gateways without rewriting your code.
Identical Code Paths Execute queries symmetrically across edge and cloud endpoints.
Multi-Endpoint Control Register local and remote models under one unified client.
Native Performance Tap local WebGPU execution or cloud gateways with equal ease.
recipe.ts
import { SippClient } from '@sipphq/sipp';
// One client. Pour in the browser or from the cloud. const blender = new SippClient();
// Run in the browser on WebGPU (or go native: CUDA · Vulkan · Metal) const juice = await blender.add('edge', { kind: 'local', source: '/models/llama3.gguf', });
// ...or pour from a secure cloud gateway. Same interface, either way. const ice = await blender.add('cloud', { kind: 'gateway', baseUrl: 'https://gateway.example.com/v1/', });
// Stream inference from either endpoint with one symmetric API const [smoothie, snowcone] = await Promise.all([ blender.chat([{ role: 'user', content: 'Explain Sipp.' }], { endpoint: juice }), blender.chat([{ role: 'user', content: 'Create a Sipp app.' }], { endpoint: ice }) ]);
✶ Same symmetric API, local or cloud.
Benchmark · WebGPU showdown
Same model. Faster in the browser.
Sipp's WebGPU backend runs the same weights up to 5× faster than other browser runtimes. No native install. Pick a model and watch the multipliers stack up.
Run the benchmark
Mobile support is currently being worked on. Try demos on desktop.
Sipp vs
Transformers.js
8.4×
faster
TTFT8.4× faster
Sipp
1×
Decode3.8× tok/s
Sipp
1×
E2E latency3.5× faster
Sipp
1×
Sipp vs
WebLLM
5.4×
faster
TTFT5.4× faster
Sipp
1×
Decode3.5× tok/s
Sipp
1×
E2E latency3.3× faster
Sipp
1×
Measured on Qwen 2.5 0.5B · Q4_K_M. LILO · 1024 in / 512 out · NVIDIA 3080 · Chrome (N=3, 9 runs, 1 warmup). Multipliers show how many times faster Sipp runs vs each browser runtime.
Live demo · Fresh squeeze
Pick a model. Sip the tokens.
A bare-bones chat running 100% in your browser. Pick a model, start the tap, and then chat. No account, no server.
Try the full demo
Mobile support is currently being worked on. Try demos on desktop.
The juice machine
Idle
1 · Pick your flavor
Mobile support is currently being worked on. Try demos on desktop.
Nothing downloads until you start. Weights are cached after the first pour.
sipp · chatoffline
Start the tap on the left to wake the model, then chat away.
Built with Sipp · 100% in-browser
Pour it into anything.
Real apps running real models with Sipp. No servers, no install, no waiting. Every one runs the model right in your browser.
Mobile support is currently being worked on. Try demos on desktop.
GameDesktop
🪄Desktop only
PromptCast
↗
A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.
Desktop only
GameLocal
🪄Live demo
PromptCast
↗
A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.
Play in browser ›
AgentsDesktop
🍌Desktop only
Banana Brawl
↗
A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.
Desktop only
AgentsLocal
🍌Live demo
Banana Brawl
↗
A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.
Play in browser ›
VisionDesktop
🎨Desktop only
Sketch Critic
↗
Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.
Desktop only
VisionLocal
🎨Live demo
Sketch Critic
↗
Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.
Play in browser ›
ChatDesktop
💬Desktop only
Aria
↗
Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.
Desktop only
ChatLocal
💬Live demo
Aria
↗
Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.
Play in browser ›
One client · every runtime
Start in the browser. Scale anywhere.
Sipp leads with the fastest runtime on the web. The same client API follows you to Node, Rust, Python, and a self-hosted gateway.
FeaturedWebGPU · zero install
Browser
Run model weights in the browser on WebGPU. No servers and no dependencies, just pure bliss.
$ npm install @sipphq/sippRead the browser docs ›
Node
Server
Server-side inference and framework route handlers in any Node runtime.
$npm install @sipphq/sipp-server
Read the docs ›
Rust
Native
Native apps and services built on the sipp crate.
$cargo add sipp-rs
Read the docs ›
Python
Wheels
Local and gateway inference from Python, with bare-metal backends for fast compute.
Wheels via GitHub
Read the docs ›
Gateway Server
Self-host
One HTTP boundary that owns your keys, routing, policies, and metrics.
Build from source today
Read the docs ›
Need managed infrastructure for production workloads?
Commercial solutions ›
Fresh batch ready
Pour your first inference.
Install Sipp, run a model in your browser on WebGPU, then scale to Node, Rust, Python, or your own gateway.
Get Started →Join Discord →