AI News HubLIVE
In-site rewrite3 min read

Show HN: Sipp – Run small local LLMs in browser 3x faster

Sipp is a new open-source WebGPU runtime that runs small LLMs in the browser up to 3x faster than alternatives, with zero install and a unified API for local and cloud inference.

SourceHacker News AIAuthor: jjhartmann

Blazing-fast WebGPU runtime · Open Source

AI inference,

The fastest runtime for the web. Run models right in the browser, zero install, and zero dependencies. Build games, agents, vision, and chat. Add a secure cloud gateway when you need it, all from one client.

Get Started →

Open sourceZero installLocal + gateway

Now withCUDA · VULKAN · METAL

SippPink Lemonade Ed.

Net wt. 1 SDK

100% real inference

Nutrition Facts

Serving size: 1 Sipp Client

Amount per pour

Dependencies0

Cold start< 1000ms

EngineRust · C++ · GGML

BackendBrowser-nativeWebGPU

% Dev Value

Open source100%

Type-safe100%

Framework sludge0%

$ npm install @sipphq/sipp

Runs in your browserFastest WebGPU runtimeBuild gamesAgents & botsVision & chatZero installLocal + gatewayFully Open Source

Runs in your browserFastest WebGPU runtimeBuild gamesAgents & botsVision & chatZero installLocal + gatewayFully Open Source

Ingredients

WebGPU · WASM · Rust · C++ · GGUF · TypeScript · 100% real tokens. Contains no frameworks, no concentrate, no added sugar.

Taste test

Easy Setup. One Simple API.

Manage and query multiple inference endpoints through a single, unified API. Switch or split traffic between local browser execution and cloud gateways without rewriting your code.

Identical Code Paths Execute queries symmetrically across edge and cloud endpoints.

Multi-Endpoint Control Register local and remote models under one unified client.

Native Performance Tap local WebGPU execution or cloud gateways with equal ease.

recipe.ts

import { SippClient } from '@sipphq/sipp';

// One client. Pour in the browser or from the cloud. const blender = new SippClient();

// Run in the browser on WebGPU (or go native: CUDA · Vulkan · Metal) const juice = await blender.add('edge', { kind: 'local', source: '/models/llama3.gguf', });

// ...or pour from a secure cloud gateway. Same interface, either way. const ice = await blender.add('cloud', { kind: 'gateway', baseUrl: 'https://gateway.example.com/v1/', });

// Stream inference from either endpoint with one symmetric API const [smoothie, snowcone] = await Promise.all([ blender.chat([{ role: 'user', content: 'Explain Sipp.' }], { endpoint: juice }), blender.chat([{ role: 'user', content: 'Create a Sipp app.' }], { endpoint: ice }) ]);

✶ Same symmetric API, local or cloud.

Benchmark · WebGPU showdown

Same model. Faster in the browser.

Sipp's WebGPU backend runs the same weights up to 5× faster than other browser runtimes. No native install. Pick a model and watch the multipliers stack up.

Run the benchmark

Mobile support is currently being worked on. Try demos on desktop.

Sipp vs

Transformers.js

8.4×

faster

TTFT8.4× faster

Sipp

Decode3.8× tok/s

Sipp

E2E latency3.5× faster

Sipp

Sipp vs

WebLLM

5.4×

faster

TTFT5.4× faster

Sipp

Decode3.5× tok/s

Sipp

E2E latency3.3× faster

Sipp

Measured on Qwen 2.5 0.5B · Q4_K_M. LILO · 1024 in / 512 out · NVIDIA 3080 · Chrome (N=3, 9 runs, 1 warmup). Multipliers show how many times faster Sipp runs vs each browser runtime.

Live demo · Fresh squeeze

Pick a model. Sip the tokens.

A bare-bones chat running 100% in your browser. Pick a model, start the tap, and then chat. No account, no server.

Try the full demo

Mobile support is currently being worked on. Try demos on desktop.

The juice machine

Idle

1 · Pick your flavor

Mobile support is currently being worked on. Try demos on desktop.

Nothing downloads until you start. Weights are cached after the first pour.

sipp · chatoffline

Start the tap on the left to wake the model, then chat away.

Built with Sipp · 100% in-browser

Pour it into anything.

Real apps running real models with Sipp. No servers, no install, no waiting. Every one runs the model right in your browser.

Mobile support is currently being worked on. Try demos on desktop.

GameDesktop

🪄Desktop only

PromptCast

A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.

Desktop only

GameLocal

🪄Live demo

PromptCast

A wizard duel where every spell is generated on the fly by a local LLM. No two casts the same.

Play in browser ›

AgentsDesktop

🍌Desktop only

Banana Brawl

A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.

Desktop only

AgentsLocal

🍌Live demo

Banana Brawl

A swarm of little agents reason in-browser, each running a local model to pick its next move, all fighting for one banana.

Play in browser ›

VisionDesktop

🎨Desktop only

Sketch Critic

Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.

Desktop only

VisionLocal

🎨Live demo

Sketch Critic

Draw something and a local vision model snapshots the canvas, reads it, and gives you live feedback.

Play in browser ›

ChatDesktop

💬Desktop only

Aria

Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.

Desktop only

ChatLocal

💬Live demo

Aria

Chat with a VRM character whose emotes, actions, and replies are all chosen live by a local model.

Play in browser ›

One client · every runtime

Start in the browser. Scale anywhere.

Sipp leads with the fastest runtime on the web. The same client API follows you to Node, Rust, Python, and a self-hosted gateway.

FeaturedWebGPU · zero install

Browser

Run model weights in the browser on WebGPU. No servers and no dependencies, just pure bliss.

$ npm install @sipphq/sippRead the browser docs ›

Node

Server

Server-side inference and framework route handlers in any Node runtime.

$npm install @sipphq/sipp-server

Read the docs ›

Rust

Native

Native apps and services built on the sipp crate.

$cargo add sipp-rs

Read the docs ›

Python

Wheels

Local and gateway inference from Python, with bare-metal backends for fast compute.

Wheels via GitHub

Read the docs ›

Gateway Server

Self-host

One HTTP boundary that owns your keys, routing, policies, and metrics.

Build from source today

Read the docs ›

Need managed infrastructure for production workloads?

Commercial solutions ›

Fresh batch ready

Pour your first inference.

Install Sipp, run a model in your browser on WebGPU, then scale to Node, Rust, Python, or your own gateway.

Get Started →Join Discord →