AI News HubLIVE
In-site rewrite5 min read

I removed the vector database from my AI agent stack

Moss is a sub-10 ms semantic search runtime for Conversational AI agents. It eliminates the need for a remote vector database by embedding search and retrieval in-process, achieving single-digit millisecond query latency. It supports hybrid search, built-in embeddings, metadata filtering, and a WebAssembly build for browser use. Benchmarks show Moss's P50 latency at 3.1 ms vs. 432.6 ms for Pinecone on 100,000 documents.

SourceHacker News AIAuthor: philosopherr

Uh oh!

There was an error while loading. Please reload this page.

Notifications You must be signed in to change notification settings

Fork 52

Star 428

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

211 Commits

211 Commits

.github

.github

apps

apps

assets

assets

benchmarks

benchmarks

examples

examples

moss-live-labs

moss-live-labs

moss-workshop/starter

moss-workshop/starter

packages

packages

scripts

scripts

sdks

sdks

.env.example

.env.example

.gitignore

.gitignore

AGENTS.md

AGENTS.md

CLAUDE.md

CLAUDE.md

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

Package.swift

Package.swift

README.md

README.md

ROADMAP.md

ROADMAP.md

SECURITY.md

SECURITY.md

package-lock.json

package-lock.json

package.json

package.json

Repository files navigation

Moss is a sub-10 ms semantic search runtime built for Conversational AI agents. Hybrid retrieval (semantic + Keyword Search), built-in embeddings, metadata filtering, and a WebAssembly build that runs in the browser - all from a single SDK that embeds in your application.

No network hop on the hot path. No clusters to tune. Point the SDK at Moss Cloud, load your index, and query it in under 10 ms. Python, TypeScript, Elixir, and C.

Quickstart

Before you start: sign up at moss.dev for project_id and project_key - free tier available.

The snippets below need Python 3.10+ or Node.js 20+.

Python

pip install moss

from moss import MossClient, QueryOptions

client = MossClient("your_project_id", "your_project_key")

Create an index and add documents

await client.create_index("support-docs", [ {"id": "1", "text": "Refunds are processed within 3-5 business days."}, {"id": "2", "text": "You can track your order on the dashboard."}, {"id": "3", "text": "We offer 24/7 live chat support."}, ])

Load and query — results in {

console.log([${doc.score.toFixed(3)}] ${doc.text}); // Returned in ${results.timeTakenInMs}ms });

Why Moss?

Most retrieval stacks call out to a remote vector database. The round trip alone runs 200–500 ms - enough to break a real-time conversation.

Moss runs search and embedding inside your process. There's no network hop on the hot path, so query latency lands in the single digits - fast enough that retrieval disappears from the latency budget. If you're building a voice bot, a copilot, or any agent that talks to humans, that's the difference between a tool that feels alive and one that feels laggy.

Benchmarks

End-to-end query latency (embedding + search) on 100,000 documents, 750 measured queries, top_k=5. Tested with Macbook pro (M4 Pro, 24GB).

System P50 P95 P99 Mean

Moss 3.1 ms 4.3 ms 5.4 ms 3.3 ms

Pinecone 432.6 ms 732.1 ms 934.2 ms 485.8 ms

Qdrant 597.6 ms 682.0 ms 771.4 ms 596.5 ms

ChromaDB 351.8 ms 423.5 ms 538.5 ms 358.0 ms

Moss includes embedding in the measurement — competitors use an external embedding service (modal). Pinecone and Qdrant use cloud search.

Reproduce these benchmarks →

Moss isn't a database! It's a search runtime. You don't manage clusters, tune HNSW parameters, or worry about sharding. You index documents, load them into the runtime, and query. That's it.

Features

Sub-10 ms semantic search - single-digit-ms p99 in our benchmarks

Hybrid search - semantic + keyword in a single query

Built-in embedding models - no OpenAI key required (or bring your own)

Metadata filtering - $eq, $and, $in, $near operators

Runs in the browser too - separate WebAssembly SDK (@moss-dev/moss-web) for client-side semantic search with no server

Database connectors - ingest directly from SQLite, MongoDB, MySQL, and Supabase (packages/moss-data-connector/)

CLI - manage indexes and query from the terminal (packages/moss-cli/)

SDKs - Python (3.10+), TypeScript / Node.js (20+), Elixir, and C (libmoss)

Framework integrations - LangChain, DSPy, LlamaIndex, Pipecat, LiveKit, Vapi, ElevenLabs, Strands Agents

Examples

This repo contains working examples you can copy straight into your project:

examples/ ├── python/ # Python SDK samples │ ├── load_and_query_sample.py │ ├── comprehensive_sample.py │ ├── custom_embedding_sample.py │ └── metadata_filtering.py ├── python-classification/ # Classification example ├── javascript/ # TypeScript SDK samples │ ├── load_and_query_sample.ts │ ├── comprehensive_sample.ts │ └── custom_embedding_sample.ts ├── javascript-web/ # Browser / WASM SDK samples ├── c/ # C SDK samples (libmoss) ├── go/ # Go SDK samples ├── voice-agents/ # End-to-end voice agents (ambient + multi-agent) │ ├── airline-pnr/ # Ambient retrieval; per-PNR Moss indexes, swap mid-call │ └── mortgage-lending/ # Multi-agent flow with shared session state └── cookbook/ # Framework integrations ├── langchain/ # LangChain retriever ├── dspy/ # DSPy module ├── crewai/ # CrewAI integration ├── haystack/ # Haystack retriever ├── autogen/ # AutoGen integration ├── mastra/ # Mastra retriever ├── pydantic-ai/ # Pydantic AI integration └── daytona/ # Daytona sandbox example

apps/ ├── next-js/ # Next.js semantic search UI ├── pipecat-moss/ # Pipecat voice agent with Moss retrieval ├── vapi-moss/ # Vapi voice agent with Moss retrieval ├── elevenlabs-moss/ # ElevenLabs voice agent with Moss retrieval ├── livekit-moss-vercel/ # LiveKit voice agent on Vercel ├── agora-moss/ # Agora Conversational AI MCP server with Moss retrieval ├── moss-llamaindex/ # LlamaIndex RAG backend + frontend ├── moss-bun/ # Bun runtime example └── docker/ # Dockerized examples (ECS/K8s pattern)

moss-live-labs/ # Experimental zone: prototypes and community demos ├── python/ # Minimal Python quickstart + advanced query example ├── typescript/ # Minimal TypeScript quickstart + advanced query example ├── examples/ # Larger experiments (image search, voice agents) │ ├── voice-agent/ # LiveKit + Moss voice assistant │ ├── advanced-voice-agent/ # Persona impersonator built on a PDF knowledge base │ └── image-search/ # FastAPI + React image search over COCO └── community-demos/ # Community-contributed projects └── voice-agents/ # bharat-benefits, shoplabs-voice-agent

Run the Python examples

cd examples/python pip install -r requirements.txt cp ../../.env.example .env # Add your credentials python load_and_query_sample.py

Run the TypeScript examples

cd examples/javascript npm install cp ../../.env.example .env # Add your credentials npx tsx load_and_query_sample.ts

Run the Next.js app

cd apps/next-js npm install cp ../../.env.example .env # Add your credentials npm run dev # Open http://localhost:3000

Run the Pipecat voice agent

Sub-10 ms retrieval plugged into Pipecat's real-time voice pipeline — a customer support agent that actually keeps up with conversation.

cd apps/pipecat-moss/pipecat-quickstart

See README for setup and Pipecat Cloud deployment

Run the fully-local voice agent (Ollama + Moss + Pipecat)

A privacy-first voice AI stack: Ollama for LLM inference, Moss for retrieval, Pipecat for real-time audio - the LLM and retrieval both run on your machine.

cd apps/pipecat-moss/ollama-local docker compose up

Full API reference: docs.moss.dev.

Integrations

Framework Status Example

LangChain Available examples/cookbook/langchain/

DSPy Available examples/cookbook/dspy/

LlamaIndex Available apps/moss-llamaindex/

CrewAI Available examples/cookbook/crewai/

AutoGen Available examples/cookbook/autogen/

Haystack Available examples/cookbook/haystack/

Mastra Available examples/cookbook/mastra/

Pydantic AI Available examples/cookbook/pydantic-ai/

Pipecat Available apps/pipecat-moss/

LiveKit Available apps/livekit-moss-vercel/

Vapi Available apps/vapi-moss/

ElevenLabs Available apps/elevenlabs-moss/

Agora Available apps/agora-moss/

Strands Agents Available packages/strands-agents-moss/

Next.js Available apps/next-js/

VitePress Available packages/vitepress-plugin-moss/

Vercel AI SDK Available packages/vercel-sdk/

Architecture

Three parts:

Moss Cloud - handles ingestion, document embedding, storage, and distribution. Point the SDK at it with a project ID and key.

Index - your documents and their vectors, packaged as a single artifact that lives on Moss Cloud.

Runtime - embedded in your application. It pulls indexes over HTTPS, holds them in memory, and serves queries locally.

Once an index is loaded, queries don't leave your process - that's where the sub-10 ms latency comes from. Document changes flow through Moss Cloud and the runtime stays in sync.

Two ways to run the runtime

Server-side - moss (Python) and @moss-dev/moss (Node.js 20+) embed the runtime in your backend. Use this when your agent runs on a server.

Browser - @moss-dev/moss-web is a WebAssembly build that downloads the index and runs queries entirely client-side, no server required. Use this for static sites, browser extensions, and offline-first apps. See examples/javascript-web/.

Full Python SDK source code is available at sdks/python/.

Contributing

Here's where the community can have the most impact:

New SDK bindings — Swift, Go, Elixir,...

Framework integrations — CrewAI, Haystack, AutoGen

Reranking support — plug in cross-encoder rerankers

Doc-parsing connectors — PDF, DOCX, HTML, Markdown ingestion

Examples and tutorials — if you build something with Moss, we'd love to feature it

See our Contributing Guide for setup instructions and our Roadmap for what's planned.

Check out issues labeled good first issue to get started.

Contributors

Community

Discord — ask questions, share what you're building

GitHub Issues — bug reports and feature requests

Twitter — announcements and updates

License

BSD 2-Clause License — the SDKs, examples, and integrations in this repo are fully open source.

Built by the team at Moss · Backed by Y Combinator

About

The retrieval layer for production AI systems. Lightning-fast (<10ms) search without vector databases. Built for browser, edge, on-device, and cloud.

moss.dev

Topics

real-time

retrieval

semantic-search

ai-agents

rag

voice-ai

ai-infra

hybrid-search

Resources

Readme

License

BSD-2-Clause license

Code of conduct

Code of conduct

Contributing

Contributing

Security policy

Security policy

Uh oh!

There was an error while loading. Please reload this page.

Activity

Custom properties

Stars

428 stars

Watchers

0 watching

Forks

52 forks

Report repository

Releases 4

Moss iOS SDK v0.4.1

Latest

Jun 3, 2026

+ 3 releases

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Uh oh!

There was an error while loading. Please reload this page.

Contributors

Uh oh!

There was an error while loading. Please reload this page.

Languages

Python 39.4%

TypeScript 31.1%

Elixir 6.6%

Rust 6.3%

Go 5.2%

Swift 4.0%

Other 7.4%

I removed the vector database from my AI agent stack | AI News Hub