Headroom by Tejas Chopra is one of the fastest-growing open-source tools in agent infrastructure—29.5K+ GitHub stars, 2K forks, and 155 releases as of June 2026 (latest: v0.25.0). It is not another LLM wrapper. It is a local-first context compression layer that sits between your agent and the provider:
Compress everything your AI agent reads—tool outputs, logs, RAG chunks, files, conversation history—before it reaches the LLM. Same answers, fraction of the tokens.
Live demo from the README: 10,144 → 1,260 tokens—same FATAL found in a log search.
If Karpathy's LLM Wiki solves what knowledge to compile, Headroom solves how much of that knowledge fits in the window. They stack: a maintained wiki reduces re-retrieval; Headroom shrinks what still ships to the model.
TL;DR
| Question | Answer |
|---|---|
| Repo | github.com/chopratejas/headroom |
| License | Apache 2.0 |
| Install | pip install "headroom-ai[all]" or npm install headroom-ai |
| Quick start | headroom wrap claude |
| Modes | Library · proxy · MCP · agent wrap |
| Savings | 60–95% on real workloads (vendor benchmarks) |
| Reversible | Yes — CCR caches originals locally |
| Model | Kompress-v2-base (HuggingFace) |
| Docs | headroom-docs.vercel.app |
The Problem: Context Is the Bottleneck
Coding agents burn tokens on:
| Source | Why it hurts |
|---|---|
| Tool outputs | grep, test logs, API JSON—verbose by default |
| RAG chunks | Retrieved docs repeat across turns |
| File reads | Whole files when a summary would suffice |
| Conversation history | Long sessions fill the window before the task finishes |
Provider-native compaction (OpenAI, Anthropic /compact) helps conversation but not arbitrary tool payloads. Hosted compression APIs send your data off-machine and often destroy reversibility.
Headroom's pitch: compress at the boundary, locally, with retrieval on demand.
Architecture (30 Seconds)
Your agent (Claude Code, Cursor, Codex, LangChain, …)
│ prompts · tool outputs · logs · RAG · files
▼
┌──────────────────────────────────────────────┐
│ Headroom (local — data stays on your machine) │
│ CacheAligner → ContentRouter → CCR │
│ ├─ SmartCrusher (JSON) │
│ ├─ CodeCompressor (AST / tree-sitter) │
│ └─ Kompress-base (prose, HuggingFace) │
│ Cross-agent memory · headroom learn · MCP │
└──────────────────────────────────────────────┘
│ compressed prompt + retrieval tool
▼
LLM provider (Anthropic · OpenAI · Bedrock · …)
| Component | Role |
|---|---|
| ContentRouter | Detects content type, picks compressor |
| SmartCrusher | Structured JSON compression |
| CodeCompressor | AST-aware code shrinking |
| Kompress-base | Neural text compression (Kompress-v2-base) |
| CacheAligner | Stabilizes prefixes for provider KV cache hits |
| CCR | Stores originals; headroom_retrieve fetches full text |
The codebase is 78% Python, 17% Rust (performance core), plus TypeScript SDK—serious engineering, not a thin wrapper.
Four Ways to Run It
1. Agent wrap (fastest for coding agents)
pip install "headroom-ai[all]"
headroom wrap claude # Claude Code
headroom wrap codex # shares memory with Claude
headroom wrap cursor # prints config — paste once
headroom wrap aider
headroom wrap copilot
Flags like --memory and --code-graph extend Claude Code integration per the agent compatibility matrix.
2. Drop-in proxy (zero code changes)
headroom proxy --port 8787
Point any OpenAI-compatible client at localhost:8787. Works for custom apps, CI, or languages without a native SDK.
3. Library (inline)
from headroom import compress
compressed = compress(messages)
TypeScript: npm install headroom-ai.
4. MCP server
Tools exposed to any MCP client (Model Context Protocol):
| Tool | Purpose |
|---|---|
headroom_compress | Compress arbitrary context |
headroom_retrieve | Fetch CCR-cached originals |
headroom_stats | Token savings telemetry |
Install: headroom mcp install.
Token Savings: Real Workloads
From Headroom's published benchmarks on agent-shaped tasks:
| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |
Accuracy on standard evals (reproduce with python -m headroom.evals suite --tier 1):
| Benchmark | Baseline | Headroom | Notes |
|---|---|---|---|
| GSM8K (math) | 0.870 | 0.870 | No delta |
| TruthfulQA | 0.530 | 0.560 | +0.030 |
| SQuAD v2 | — | 97% acc | ~19% compression |
| BFCL (tools) | — | 97% acc | ~32% compression |
Measure your own runs: headroom perf.
CCR: Why Reversible Matters
Irreversible summarization fails when the model needs line 847 of the stack trace or one field in a 200-row JSON response. CCR pattern:
- Compress for the initial prompt
- Cache full originals locally (TTL-configurable)
- Expose
headroom_retrieveso the model pulls detail only when needed
This is the difference between "cheaper but blind" and "cheaper but auditable."
Cross-Agent Memory and headroom learn
Cross-agent memory — shared store across Claude, Codex, Gemini with auto-dedup. Stop re-explaining architecture every time you switch tools.
headroom learn — mines failed sessions, writes corrections to CLAUDE.md, AGENTS.md, or GEMINI.md. Compression reduces tokens; learning reduces repeated mistakes—orthogonal wins.
GitHub Copilot CLI Subscription Mode
Headroom can proxy Copilot CLI subscription traffic:
headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o
Headroom exchanges its GitHub OAuth token for Copilot's short-lived API token and sets COPILOT_PROVIDER_API_URL for the wrapper. Enterprise Server: set GITHUB_COPILOT_ENTERPRISE_DOMAIN. For Docker/CI, pass explicit GITHUB_COPILOT_TOKEN rather than relying on host keychain.
Agent Compatibility
| Agent | headroom wrap | Notes |
|---|---|---|
| Claude Code | ✅ | --memory, --code-graph |
| Codex | ✅ | Shared memory with Claude |
| Cursor | ✅ | Config snippet to paste |
| Aider | ✅ | Starts proxy + launches |
| Copilot CLI | ✅ | Subscription mode supported |
| OpenClaw | ✅ | ContextEngine plugin |
Any OpenAI-compatible client works via proxy. See Claude Code commands for /context and /compact alongside external compression.
Headroom vs Alternatives
| Tool | Scope | Local | Reversible |
|---|---|---|---|
| Headroom | All context types | ✅ | ✅ (CCR) |
| RTK | CLI command outputs | ✅ | ❌ |
| lean-ctx | CLI, MCP, editor rules | ✅ | ❌ |
| Compresr / Token Co. | Text via hosted API | ❌ | ❌ |
| OpenAI compaction | Conversation only | Provider | ❌ |
Headroom ships RTK for shell-output rewriting and can use lean-ctx via HEADROOM_CONTEXT_TOOL=lean-ctx—compress downstream of whichever CLI context tool you prefer.
Installation Details
# Python (everything)
pip install "headroom-ai[all]"
# Node / TypeScript
npm install headroom-ai
# Docker
docker pull ghcr.io/chopratejas/headroom:latest
Requires Python 3.10+. Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [langchain], [agno], [evals], [pytorch-mps] (Apple GPU embedder offload).
pipx: pipx install --python python3.13 "headroom-ai[all]"
Corporate SSL inspection
If pip install fails with CERTIFICATE_VERIFY_FAILED, install Rust first (maturin downloads rustup over TLS), or use prebuilt wheels: pip install --only-binary headroom-ai headroom-ai.
Runtime assets fetched over TLS:
- cdn.pyke.io — ONNX Runtime (or
ORT_STRATEGY=system) - huggingface.co — kompress-base model (or
HF_HUB_OFFLINE=1with pre-download)
Pure gateway mode (compression disabled) needs neither.
When to Use · When to Skip
Great fit if you:
- Run coding agents daily and want savings without rewriting your app
- Work across multiple agents and want shared memory
- Need reversible compression with local data residency
Skip if you:
- Only use one provider's native compaction and never hit tool-output bloat
- Cannot run local processes (strict sandbox with no proxy)
Headroom complements LLM wikis and OKF bundles—pre-compiled knowledge plus compressed delivery.
OpenClaw and Integrations
Headroom installs as an OpenClaw ContextEngine plugin. Integrations span LangChain, Agno, Strands, FastAPI middleware, and custom stacks—see llms.txt for machine-readable doc index.
Getting Started Checklist
pip install "headroom-ai[all]"headroom wrap claude(or your agent)- Run a heavy task—large grep, test output, RAG pull
headroom perf— inspect savings- Optional:
headroom mcp install, enableheadroom learn, tune CCR TTL
Summary
Headroom is the context compression layer the agent ecosystem was missing: local-first, multi-algorithm, reversible, and agent-native. 29.5K stars in roughly five months since OSS release reflects how universal the token problem is.
Install in 60 seconds. Wrap Claude Code in one command. Keep your data on your machine. Retrieve originals when the model needs them.
Related Reading
- Claude Code Commands Reference
- What is CLAUDE.md?
- What is MCP?
- Karpathy LLM Wiki Pattern
- Loop Engineering for Coding Agents
- RAG vs Agentic RAG
Features, benchmarks, and install paths cited from chopratejas/headroom README and docs as of June 14, 2026.