2026-07-05 06:27 UTCIn-site rewrite6 min readUpdated: 2026-07-05 06:42 UTC

Fidx – Local semantic search in one SQLite file, no LLM at query

Fidx is a local semantic search engine that combines BM25 full-text search with 768-dimensional vector embeddings, all stored in a single SQLite file. It requires no LLM calls during queries, only a single ONNX embedding pass, enabling millisecond-class queries on CPU. It can be used as a CLI, a private RAG retriever, or the search layer behind agent memory.

SourceHacker News AIAuthor: williamliu_ai

Notifications You must be signed in to change notification settings

Fork 0

Star 0

BranchesTags

Open more actions menu

Folders and files

NameName

Last commit message

Last commit date

Latest commit

History

15 Commits

.github

.well-known

LICENSES

bench

docker

docs

scripts

src/fidx

tests

.dockerignore

.gitignore

.python-version

AI_ATTRIBUTION.md

ATTRIBUTIONS.md

CHANGELOG.md

CITATION.cff

CODE_OF_CONDUCT.md

CONTRIBUTING.md

LICENSE

NOTICE

README.md

SECURITY.md

SOURCE_HEADER_EXAMPLES.md

ai-attribution-policy.json

codemeta.json

pyproject.toml

uv.lock

Repository files navigation

Local AI search engine for your files and agents.

Fast, local-only semantic + keyword search for markdown, text, chat exports and code. CPU-only. No cloud, no GPU, no API keys. One SQLite file holds the full index. Use it as a CLI, a private RAG retriever, or the search layer behind agent memory: millisecond warm queries, JSON output, and collection scoping.

The demo corpus is the benchmark's synthetic chat data (attached to the release along with the doc corpora and query sets); demo-driver.sh there reproduces the recording.

Why

Local semantic search tools tend to buy recall with latency: LLM query expansion and LLM reranking push a single query to ~10 seconds on CPU. fidx takes a different trade — hybrid BM25 + 768-dim vector search fused with reciprocal-rank fusion (RRF), and no LLM calls in the query path. One ONNX embedding pass per query is the only model work.

Hybrid recall — FTS5 BM25 catches exact names and identifiers; 768-dim embeddings catch "that doc that discussed the indexing project"; RRF fuses both.

Millisecond-class queries — a warm daemon answers hybrid searches in 18–49 ms (p50, single ONNX thread) on 2k–19k-doc corpora; cold CLI calls stay well under a second.

One file — documents, BM25 index and vectors live in a single SQLite database (FTS5 + sqlite-vec). Copy it, back it up, delete it.

Scoped search — group sources into named collections (-c emails) and search only what you mean.

Requirements

Python 3.11 or 3.12 whose sqlite3 supports loadable extensions and FTS5 (fidx loads the sqlite-vec extension). Run fidx doctor to verify.

Prebuilt wheels exist for the verified platforms below — no compiler needed.

First fidx index downloads the embedding model once (then fully offline).

Platform (triple) Status Notes

Linux x86_64 ✅ verified (CI + Docker) any Python 3.11/3.12 with extensions

macOS arm64 (Apple Silicon) ✅ verified (CI) use Homebrew Python (see install)

Windows x86_64 ✅ verified (CI) python.org / uv Python

macOS Intel, Linux/Windows arm64 best-effort depends on upstream wheel availability

Install

Why is the package named fmdidx? The project and its command are fidx, but the PyPI name fidx was already taken by an unrelated package — so fidx is distributed as fmdidx ("fast markdown index"). That is the only place the name differs: uv tool install fmdidx installs the fidx command.

The recommended installer is uv, because a uv-managed Python ships loadable sqlite extensions on Linux and Windows.

Linux / Windows:

uv tool install fmdidx # or: pipx install fmdidx fidx doctor # verify your host

macOS: uv's bundled Python (and the python.org build) ship a sqlite3 without loadable-extension support, so use Homebrew Python:

brew install python uv tool install --python "$(brew --prefix python)/libexec/bin/python" fmdidx

or: pipx install --python "$(brew --prefix python)/libexec/bin/python3" fmdidx

fidx doctor

From a built wheel (works today, name-independent):

uv build pip install --only-binary=:all: dist/*.whl # use Homebrew Python on macOS fidx doctor

From a checkout (development):

uv sync && uv run fidx doctor

If fidx doctor reports a failure, it prints exactly what is missing and how to fix it — see Troubleshooting.

Quick start

Register directories as named collections

fidx collection add ~/notes --name notes fidx collection add ~/mail/export --name emails --glob "**/*.txt"

Scan + chunk + embed (incremental; first run downloads the ONNX model)

fidx index

Search (hybrid BM25 + vector by default)

fidx search "the doc that discussed the most recent indexing project" fidx search "Grace Hopper" --mode lexical # exact-name lookup, no model load fidx search "deployment checklist" -c notes # scope to one collection

Agent-friendly output

fidx search "error handling" --json -n 10 fidx search "auth" --files --min-score 0.02

Fetch a document by path or docid

fidx get "notes/meeting.md" fidx get "#a1b2c3"

Warm daemon (recommended for agents)

fidx serve & # keeps the model + index hot on a unix socket fidx search "..." # all searches now take milliseconds

The CLI uses the daemon automatically when it is running; --no-daemon opts out.

Agents and RAG tools

fidx is not an agent framework. Its integration surface is the CLI: register local files, index them, keep the daemon warm, then let an agent or workflow call fidx search --json and fidx get.

Index notes, memory exports, docs, or code as separate searchable scopes

fidx collection add ./memory --name memory \ --glob "/*.md" --glob "/*.txt" --glob "/*.jsonl" fidx collection add ./docs --name docs --glob "/*.md" --glob "/*.txt" fidx collection add ./src --name code \ --glob "/*.py" --glob "/*.ts" --glob "/*.go" --glob "**/*.rs"

fidx index fidx serve &

Agent-memory lookup: structured JSON for a tool call

fidx search "what did we decide about retry handling?" -c memory --json -n 5

Local RAG retrieval: search docs, then fetch the selected source text

fidx search "sqlite vector backend configuration" -c docs --json -n 5 fidx get --head "#a1b2c3"

Coding-agent context: return only matching paths for follow-up reads

fidx search "where is request timeout handled?" -c code --files -n 20

For MCP servers, LangChain/LangGraph tools, LlamaIndex retrievers, workflow nodes, or shell-based coding agents, the same contract is usually enough: fidx search --json returns ranked results with snippets, paths, docids and scores; fidx get expands a selected result by docid or collection/path.

Verifying your install

fidx doctor # host capability report (exit 0 = ready)

Full end-to-end benchmark on a ~1,000-doc corpus against the installed CLI:

python scripts/e2e_smoke.py # builds corpus, indexes, searches, gates recall

Clean-machine proof in pristine Docker containers (Linux):

scripts/verify-install.sh # builds the wheel, installs + runs e2e on 3.11 & 3.12

The same e2e runs in CI on Linux, macOS (arm64) and Windows × Python 3.11/3.12 (the install-matrix workflow) — installing the built wheel from scratch and asserting recall@10.

How it works

files ──> documents (SQLite) ──> FTS5 (BM25, porter) ─┐ │ ├─> RRF fusion ─> results └──> chunks ──> ONNX embeddings ─> sqlite-vec ┘

Chunking splits at the best structural break (headings, code-fence boundaries, blank lines) near a ~1800-char target with 15% overlap, never inside a code fence. Chunks store offsets, not copies.

Embeddings via fastembed/ONNX (CPU). The default profile is 768-dim; smaller profiles exist for small corpora.

Search runs BM25 and vector KNN in parallel and fuses with RRF (k=60); results are document-level with best-chunk snippets.

Troubleshooting

enable_load_extension / "sqlite3 was built without loadable-extension support" — your Python's sqlite cannot load sqlite-vec. This is the default on macOS system Python and uv/python.org macOS builds. Fix: install fidx with Homebrew Python (see macOS install above). fidx doctor confirms the fix.

sqlite-vec failed to load / wrong architecture — ensure a sqlite-vec wheel exists for your platform: pip install --only-binary=:all: sqlite-vec.

First search is slow / offline use — the embedding model downloads once on first index. Pre-seed FASTEMBED_CACHE_PATH to use fidx air-gapped.

Benchmarks

bench/ is a reproducible harness comparing fidx against QMD on four corpora with known-item queries (CPU-only, warm engines, idle box). Result quality has two axes that trade off: recall (is the right document in the top-10) and purity — noise@10 (share of returned results an LLM judge rated irrelevant, lower is better) and clean@10 (share of queries whose results contain zero noise, higher is better). And queries come in two regimes: known-item queries built from the document's own words (most lookups: names, identifiers, remembered phrases) and paraphrase queries that share almost no vocabulary with the target (pure semantic recall). The known-item headline: vs QMD's LLM hybrid, fidx wins both purity metrics on every corpus and recall on docs and code — recall ties on docs-small and is 0.004 behind on chat — at ~300–1000× lower latency on the doc/chat corpora and ~65× on the 92k-file code corpus.

Corpus (size) Engine R@10 ↑ noise@10 ↓ clean@10 ↑ p50 latency

docs-small (2k) fidx 0.933 0.219 0.560 20 ms

QMD query (LLM hybrid) 0.933 0.564 0.053 33 s

QMD search (FTS) 0.920 n/m n/m 78 ms

docs (18.8k) fidx 0.962 0.250 0.482 49 ms

QMD query 0.914 0.677 0.060 36 s

QMD search 0.896 0.353 0.818 87 ms

chat (8k) fidx 0.908 0.133 0.710 18 ms

QMD query 0.912 0.472 0.186 18 s

QMD search 0.916 0.086 0.964 81 ms

code (92.3k) fidx 0.864 0.127 0.704 452 ms

QMD query 0.782 0.713 0.056 33 s

QMD search 0.784 0.256 0.868 121 ms

fidx rows are measured with its built-in deterministic result truncation enabled (--truncate knee; ships off by default — without it fidx trades purity for recall, e.g. code R@10 0.900 at noise 0.297) and the e5-768 profile (fidx index --profile e5-768; the install default is nomic-768-q, which scored identically on code — see BENCHMARKS.md). "n/m" = not measured.

Hybrid vs hybrid (fidx vs QMD query): fidx wins noise@10 and clean@10 on every corpus, and recall on docs and code (tie on docs-small; 0.908 vs 0.912 on chat) — e.g. code recall +8 pts with 5.6× less noise — with no LLM anywhere in its query path.

Where QMD wins: its FTS search mode is the purity champion on chat (clean 0.964) and the latency champion on the big code corpus (121 ms vs fidx's 452 ms brute-force KNN over 92k vectors), but trails on recall where it matters (docs, code). QMD's pure-vector mode collapses to 0.048 R@10 on code.

fidx stays sub-second even on its weakest corpus and is ~65× faster than QMD's LLM modes there.

The semantic regime (paraphrase queries). After an independent reviewer correctly noted that known-item queries reward lexical overlap, we built an LLM-written, separately-LLM-validated paraphrase query set (~0.1 query→doc word overlap; checked into bench/data/) — same targets, no copied distinctive terms. Recall@10 there:

corpus fidx --mode vector fidx hybrid QMD search (BM25) QMD query (LLM)

docs-small (2k) 0.541 0.419 0.000 0.635

docs (18.8k) 0.450 0.385 0.000 pending

chat (8k) 0.373 0.317 0.000 pending

code (92.3k) 0.041 0.039 0.000 pending

Honest readings: BM25 gets literally zero without shared terms; fidx's vector arm does real semantic work on prose at millisecond latency; QMD's LLM expansion buys the best semantic recall where measured — at ~33 s vs 20 ms per query; fidx's hybrid fusion currently drags below its own vector mode on this regime (query-adaptive weighting is a roadmap item — use --mode vector for purely conceptual queries); and semantic search over 92k code files with a 768-d text embedder does not work — fidx's code strength is lexical.

Full tables (R@1

[truncated for AI cost control]