2026-05-31 20:10 UTCIn-site rewrite6 min readUpdated: 2026-06-30 13:03 UTC

Headroom compresses everything your AI agent reads before it reaches the LLM

Headroom is an open-source context compression layer that reduces token consumption by 50-90% by compressing all content (tool outputs, logs, RAG chunks, files, conversation history) before it reaches the LLM. It offers multiple integration modes (library, proxy, agent wrap, MCP server), supports various AI agents (Claude Code, Codex, Cursor, etc.), and preserves answer accuracy on benchmarks. The community has saved over 60B tokens.

SourceHacker News AIAuthor: mooreds

The Context Optimization Layer for LLM Applications - Cut costs by 50-90%

Maintainers

chopratejas

Project description

██╗ ██╗███████╗ █████╗ ██████╗ ██████╗ ██████╗ ██████╗ ███╗ ███╗ ██║ ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║ ███████║█████╗ ███████║██║ ██║██████╔╝██║ ██║██║ ██║██╔████╔██║ ██╔══██║██╔══╝ ██╔══██║██║ ██║██╔══██╗██║ ██║██║ ██║██║╚██╔╝██║ ██║ ██║███████╗██║ ██║██████╔╝██║ ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝ The context compression layer for AI agents

60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible

Docs · Install · Proof · Agents · Discord · llms.txt

AI agents / LLMs: read /llms.txt here, or fetch the live index / full docs blob.

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

Live: 10,144 → 1,260 tokens — same FATAL found.

What it does

Library — compress(messages) in Python or TypeScript, inline in any app

Proxy — headroom proxy --port 8787, zero code changes, any language

Agent wrap — headroom wrap claude|codex|cursor|aider|copilot in one command

MCP server — headroom_compress, headroom_retrieve, headroom_stats for any MCP client

Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup

headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md

Reversible (CCR) — originals never deleted; LLM retrieves on demand

How it works (30 seconds)

Your agent / app (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…) │ prompts · tool outputs · logs · RAG results · files ▼ ┌────────────────────────────────────────────────────┐ │ Headroom (runs locally — your data stays here) │ │ ─────────────────────────────────────────────── │ │ CacheAligner → ContentRouter → CCR │ │ ├─ SmartCrusher (JSON) │ │ ├─ CodeCompressor (AST) │ │ └─ Kompress-base (text, HF) │ │ │ │ Cross-agent memory · headroom learn · MCP │ └────────────────────────────────────────────────────┘ │ compressed prompt + retrieval tool ▼ LLM provider (Anthropic · OpenAI · Bedrock · …)

ContentRouter — detects content type, selects the right compressor

SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose

CacheAligner — stabilizes prefixes so provider KV caches actually hit

CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

→ Architecture · CCR reversible compression · Kompress-base model card

Get started (60 seconds)

1 — Install

pip install "headroom-ai[all]" # Python npm install headroom-ai # Node / TypeScript

2 — Pick your mode

headroom wrap claude # wrap a coding agent headroom proxy --port 8787 # drop-in proxy, zero code changes

or: from headroom import compress # inline library

3 — See the savings

headroom stats

Granular extras: [proxy], [mcp], [ml], [agno], [langchain], [evals]. Requires Python 3.10+.

Proof

Savings on real agent workloads:

Workload Before After Savings

Code search (100 results) 17,765 1,408 92%

SRE incident debugging 65,694 5,118 92%

GitHub issue triage 54,174 14,761 73%

Codebase exploration 78,502 41,254 47%

Accuracy preserved on standard benchmarks:

Benchmark Category N Baseline Headroom Delta

GSM8K Math 100 0.870 0.870 ±0.000

TruthfulQA Factual 100 0.530 0.560 +0.030

SQuAD v2 QA 100 — 97% 19% compression

BFCL Tools 100 — 97% 32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

60B+ tokens saved by the community — live leaderboard →

Agent compatibility matrix

Agent headroom wrap Notes

Claude Code ● --memory · --code-graph

Codex ● shares memory with Claude

Cursor ● prints config — paste once

Aider ● starts proxy + launches

Copilot CLI ● starts proxy + launches

OpenClaw ● installs as ContextEngine plugin

Any OpenAI-compatible client works via headroom proxy. MCP-native: headroom mcp install.

When to use · When to skip

Great fit if you…

run AI coding agents daily and want savings without changing your code

work across multiple agents and want shared memory

need reversible compression — originals always retrievable via CCR

Skip it if you…

only use a single provider's native compaction and don't need cross-agent memory

work in a sandboxed environment where local processes can't run

Integrations — drop Headroom into any stack

Your setup Hook in with

Any Python app compress(messages, model=…)

Any TypeScript app await compress(messages, { model })

Anthropic / OpenAI SDK withHeadroom(new Anthropic()) · withHeadroom(new OpenAI())

Vercel AI SDK wrapLanguageModel({ model, middleware: headroomMiddleware() })

LiteLLM litellm.callbacks = [HeadroomCallback()]

LangChain HeadroomChatModel(your_llm)

Agno HeadroomAgnoModel(your_model)

Strands Strands guide

ASGI apps app.add_middleware(CompressionMiddleware)

Multi-agent SharedContext().put / .get

MCP clients headroom mcp install

What's inside

SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.

CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.

Kompress-base — our HuggingFace model, trained on agentic traces.

Image compression — 40–90% reduction via trained ML router.

CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.

IntelligentContext — score-based context fitting with learned importance.

CCR — reversible compression; LLM retrieves originals on demand.

Cross-agent memory — shared store, agent provenance, auto-dedup.

SharedContext — compressed context passing across multi-agent workflows.

headroom learn — plugin-based failure mining for Claude, Codex, Gemini.

Pipeline internals

Headroom exposes one stable request lifecycle across compress(), the SDK, and the proxy:

Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received

Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.

Pipeline extensions observe or customize lifecycle stages via on_pipeline_event(...).

Compression hooks sit alongside the canonical lifecycle as an additional extension seam.

Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy.

CLI/tool slices: headroom/providers/claude, copilot, codex, openclaw

Provider runtime slices: headroom/providers/claude, gemini, plus shared backend/runtime dispatch in headroom/providers/registry.py

Core files stay orchestration-first: wrap.py, client.py, cli/proxy.py, and proxy/server.py delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.

Install

pip install "headroom-ai[all]" # Python, everything npm install headroom-ai # TypeScript / Node docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy], [mcp], [ml] (Kompress-base), [agno], [langchain], [evals]. Requires Python 3.10+.

Using pipx? Choose a supported interpreter explicitly:

pipx install --python python3.13 "headroom-ai[all]"

→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.

headroom learn

headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md / GEMINI.md.

Documentation

Start here Go deeper

Quickstart Architecture

Proxy How compression works

MCP tools CCR — reversible compression

Memory Cache optimization

Failure learning Benchmarks

Configuration Limitations

Compared to

Headroom runs locally, covers every content type, works with every major framework, and is reversible.

Scope Deploy Local Reversible

Headroom All context — tools, RAG, logs, files, history Proxy · library · middleware · MCP Yes Yes

RTK CLI command outputs CLI wrapper Yes No

lean-ctx CLI commands, MCP tools, editor rules CLI wrapper · MCP Yes No

Compresr, Token Co. Text sent to their API Hosted API call No No

OpenAI Compaction Conversation history Provider-native No No

Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting — git show --short, scoped ls, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use lean-ctx as the selected CLI context tool; set HEADROOM_CONTEXT_TOOL=lean-ctx before running headroom wrap ....

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom pip install -e ".[dev]" && pytest

Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.

Community

Live leaderboard — 60B+ tokens saved and counting.

Discord — questions, feedback, war stories.

Kompress-base on HuggingFace — the model behind our text compression.

License

Apache 2.0 — see LICENSE.

Project details

Maintainers

chopratejas

Release history

Release notifications | RSS feed

This version

0.22.3

May 21, 2026

0.22.2

May 20, 2026

0.22.1

May 20, 2026

0.22.0

May 19, 2026

0.21.39

May 15, 2026

0.21.38

May 15, 2026

0.21.37

May 14, 2026

0.21.34

May 13, 2026

0.21.33

May 13, 2026

0.21.32

May 13, 2026

0.21.31

May 13, 2026

0.21.30

May 12, 2026

0.21.29

May 12, 2026

0.21.28

May 12, 2026

0.21.27

May 12, 2026

0.21.26

May 11, 2026

0.21.25

May 11, 2026

0.21.24

May 11, 2026

0.21.23

May 11, 2026

0.21.22

May 11, 2026

0.21.21

May 10, 2026

0.21.20

May 10, 2026

0.21.19

May 10, 2026

0.21.18

May 10, 2026

0.21.17

May 9, 2026

0.21.16

May 9, 2026

0.21.15

May 9, 2026

0.21.14

May 9, 2026

0.21.13

May 8, 2026

0.21.12

May 8, 2026

0.21.11

May 8, 2026

0.21.10

May 8, 2026

0.21.9

May 8, 2026

0.21.8

May 8, 2026

0.21.7

May 8, 2026

0.21.6

May 8, 2026

0.21.5

May 6, 2026

0.21.4

May 6, 2026

0.21.3

May 6, 2026

0.21.2

May 5, 2026

0.21.1

May 5, 2026

0.21.0

May 5, 2026

0.20.27

May 5, 2026

0.20.26

May 5, 2026

0.20.21

May 4, 2026

0.20.20

May 4, 2026

0.20.19

May 4, 2026

0.20.18

May 4, 2026

0.20.17

May 4, 2026

0.20.16

May 4, 2026

0.20.15

May 3, 2026

0.20.14

May 3, 2026

0.20.13

May 3, 2026

0.20.12

May 2, 2026

0.20.11

May 2, 2026

0.20.10

May 2, 2026

0.20.9

May 2, 2026

0.20.8

May 1, 2026

0.20.7

May 1, 2026

0.20.6

May 1, 2026

0.20.5

May 1, 2026

0.20.4

Apr 30, 2026

0.20.3

Apr 30, 2026

0.20.2

Apr 30, 2026

0.20.1

Apr 30, 2026

0.20.0

Apr 30, 2026

0.19.0

Apr 30, 2026

0.18.2

Apr 30, 2026

0.18.1

Apr 30, 2026

0.18.0

Apr 30, 2026

0.17.0

Apr 30, 2026

0.16.0

Apr 30, 2026

0.15.1

Apr 29, 2026

0.15.0

Apr 29, 2026

0.14.5

Apr 29, 2026

0.14.4

Apr 29, 2026

0.14.3

Apr 29, 2026

0.14.2

Apr 29, 2026

0.14.1

Apr 28, 2026

0.14.0

Apr 28, 2026

0.13.8

Apr 28, 2026

0.13.7

Apr 28, 2026

0.13.6

Apr 28, 2026

0.13.5

Apr 28, 2026

0.13.4

Apr 28, 2026

0.13.3

Apr 28, 2026

0.13.2

Apr 28, 2026

0.13.1

Apr 28, 2026

0.13.0

Apr 28, 2026

0.12.0

Apr 27, 2026

0.11.0

Apr 27, 2026

0.10.19

Apr 27, 2026

0.10.18

Apr 27, 2026

0.10.17

Apr 26, 2026

0.10.16

Apr 26, 2026

0.10.15

Apr 25, 2026

0.10.14

Apr 25, 2026

0.10.13

Apr 25, 2026

0.10.12

Apr 25, 2026

0.10.11

Apr 25, 2026

0.10.10

Apr 24, 2026

0.10.9

Apr 24, 2026

0.10.8

Apr 23, 2026

0.10.7

Apr 23, 2026

0.10.6

Apr 23, 2026

0.10.5

Apr 23, 2026

0.10.4

Apr 23, 2026

0.10.3

Apr 23, 2026

0.10.2

Apr 23, 2026

0.10.1

Apr 23, 2026

0.10.0

Apr 23, 2026

0.9.7

Apr 22, 2026

0.9.6

Apr 22, 2026

0.9.5

Apr 22, 2026

0.9.4

Apr 22, 2026

0.9.3

Apr 22, 2026

0.9.2

Apr 22, 2026

0.9.1

Apr 22, 2026

0.9.0

Apr 22, 2026

0.8.3

Apr 21, 2026

0.8.2

Apr 21, 2026

0.8.1

Apr 21, 2026

0.8.0

Apr 21, 2026

0.7.4

Apr 21, 2026

0.7.3

Apr 21, 2026

0.7.2

Apr 21, 2026

[truncated for AI cost control]