← Back to blog

explainx / blog

Headroom: Context Compression for AI Agents (Complete Guide)

Headroom by Tejas Chopra compresses tool outputs, logs, RAG chunks, and files before they reach the LLM—60–95% fewer tokens with reversible CCR. Library, proxy, MCP, wrap for Claude Code, Cursor, Codex, and benchmarks.

·7 min read·Yash Thakker
HeadroomContext CompressionClaude CodeMCPToken OptimizationAI Agents
Headroom: Context Compression for AI Agents (Complete Guide)

Headroom by Tejas Chopra is one of the fastest-growing open-source tools in agent infrastructure—29.5K+ GitHub stars, 2K forks, and 155 releases as of June 2026 (latest: v0.25.0). It is not another LLM wrapper. It is a local-first context compression layer that sits between your agent and the provider:

Compress everything your AI agent reads—tool outputs, logs, RAG chunks, files, conversation history—before it reaches the LLM. Same answers, fraction of the tokens.

Live demo from the README: 10,144 → 1,260 tokens—same FATAL found in a log search.

If Karpathy's LLM Wiki solves what knowledge to compile, Headroom solves how much of that knowledge fits in the window. They stack: a maintained wiki reduces re-retrieval; Headroom shrinks what still ships to the model.


TL;DR

QuestionAnswer
Repogithub.com/chopratejas/headroom
LicenseApache 2.0
Installpip install "headroom-ai[all]" or npm install headroom-ai
Quick startheadroom wrap claude
ModesLibrary · proxy · MCP · agent wrap
Savings60–95% on real workloads (vendor benchmarks)
ReversibleYes — CCR caches originals locally
ModelKompress-v2-base (HuggingFace)
Docsheadroom-docs.vercel.app

The Problem: Context Is the Bottleneck

Coding agents burn tokens on:

SourceWhy it hurts
Tool outputsgrep, test logs, API JSON—verbose by default
RAG chunksRetrieved docs repeat across turns
File readsWhole files when a summary would suffice
Conversation historyLong sessions fill the window before the task finishes

Provider-native compaction (OpenAI, Anthropic /compact) helps conversation but not arbitrary tool payloads. Hosted compression APIs send your data off-machine and often destroy reversibility.

Headroom's pitch: compress at the boundary, locally, with retrieval on demand.


Architecture (30 Seconds)

 Your agent (Claude Code, Cursor, Codex, LangChain, …)
        │  prompts · tool outputs · logs · RAG · files
        ▼
 ┌──────────────────────────────────────────────┐
 │  Headroom  (local — data stays on your machine) │
 │  CacheAligner → ContentRouter → CCR           │
 │    ├─ SmartCrusher    (JSON)                  │
 │    ├─ CodeCompressor (AST / tree-sitter)      │
 │    └─ Kompress-base   (prose, HuggingFace)    │
 │  Cross-agent memory · headroom learn · MCP    │
 └──────────────────────────────────────────────┘
        │  compressed prompt + retrieval tool
        ▼
 LLM provider (Anthropic · OpenAI · Bedrock · …)
ComponentRole
ContentRouterDetects content type, picks compressor
SmartCrusherStructured JSON compression
CodeCompressorAST-aware code shrinking
Kompress-baseNeural text compression (Kompress-v2-base)
CacheAlignerStabilizes prefixes for provider KV cache hits
CCRStores originals; headroom_retrieve fetches full text

The codebase is 78% Python, 17% Rust (performance core), plus TypeScript SDK—serious engineering, not a thin wrapper.


Four Ways to Run It

1. Agent wrap (fastest for coding agents)

pip install "headroom-ai[all]"
headroom wrap claude          # Claude Code
headroom wrap codex           # shares memory with Claude
headroom wrap cursor          # prints config — paste once
headroom wrap aider
headroom wrap copilot

Flags like --memory and --code-graph extend Claude Code integration per the agent compatibility matrix.

2. Drop-in proxy (zero code changes)

headroom proxy --port 8787

Point any OpenAI-compatible client at localhost:8787. Works for custom apps, CI, or languages without a native SDK.

3. Library (inline)

from headroom import compress

compressed = compress(messages)

TypeScript: npm install headroom-ai.

4. MCP server

Tools exposed to any MCP client (Model Context Protocol):

ToolPurpose
headroom_compressCompress arbitrary context
headroom_retrieveFetch CCR-cached originals
headroom_statsToken savings telemetry

Install: headroom mcp install.


Token Savings: Real Workloads

From Headroom's published benchmarks on agent-shaped tasks:

WorkloadBeforeAfterSavings
Code search (100 results)17,7651,40892%
SRE incident debugging65,6945,11892%
GitHub issue triage54,17414,76173%
Codebase exploration78,50241,25447%

Accuracy on standard evals (reproduce with python -m headroom.evals suite --tier 1):

BenchmarkBaselineHeadroomNotes
GSM8K (math)0.8700.870No delta
TruthfulQA0.5300.560+0.030
SQuAD v297% acc~19% compression
BFCL (tools)97% acc~32% compression

Measure your own runs: headroom perf.


CCR: Why Reversible Matters

Irreversible summarization fails when the model needs line 847 of the stack trace or one field in a 200-row JSON response. CCR pattern:

  1. Compress for the initial prompt
  2. Cache full originals locally (TTL-configurable)
  3. Expose headroom_retrieve so the model pulls detail only when needed

This is the difference between "cheaper but blind" and "cheaper but auditable."


Cross-Agent Memory and headroom learn

Cross-agent memory — shared store across Claude, Codex, Gemini with auto-dedup. Stop re-explaining architecture every time you switch tools.

headroom learn — mines failed sessions, writes corrections to CLAUDE.md, AGENTS.md, or GEMINI.md. Compression reduces tokens; learning reduces repeated mistakes—orthogonal wins.


GitHub Copilot CLI Subscription Mode

Headroom can proxy Copilot CLI subscription traffic:

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

Headroom exchanges its GitHub OAuth token for Copilot's short-lived API token and sets COPILOT_PROVIDER_API_URL for the wrapper. Enterprise Server: set GITHUB_COPILOT_ENTERPRISE_DOMAIN. For Docker/CI, pass explicit GITHUB_COPILOT_TOKEN rather than relying on host keychain.


Agent Compatibility

Agentheadroom wrapNotes
Claude Code--memory, --code-graph
CodexShared memory with Claude
CursorConfig snippet to paste
AiderStarts proxy + launches
Copilot CLISubscription mode supported
OpenClawContextEngine plugin

Any OpenAI-compatible client works via proxy. See Claude Code commands for /context and /compact alongside external compression.


Headroom vs Alternatives

ToolScopeLocalReversible
HeadroomAll context types✅ (CCR)
RTKCLI command outputs
lean-ctxCLI, MCP, editor rules
Compresr / Token Co.Text via hosted API
OpenAI compactionConversation onlyProvider

Headroom ships RTK for shell-output rewriting and can use lean-ctx via HEADROOM_CONTEXT_TOOL=lean-ctx—compress downstream of whichever CLI context tool you prefer.


Installation Details

# Python (everything)
pip install "headroom-ai[all]"

# Node / TypeScript
npm install headroom-ai

# Docker
docker pull ghcr.io/chopratejas/headroom:latest

Requires Python 3.10+. Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [langchain], [agno], [evals], [pytorch-mps] (Apple GPU embedder offload).

pipx: pipx install --python python3.13 "headroom-ai[all]"

Corporate SSL inspection

If pip install fails with CERTIFICATE_VERIFY_FAILED, install Rust first (maturin downloads rustup over TLS), or use prebuilt wheels: pip install --only-binary headroom-ai headroom-ai.

Runtime assets fetched over TLS:

  • cdn.pyke.io — ONNX Runtime (or ORT_STRATEGY=system)
  • huggingface.co — kompress-base model (or HF_HUB_OFFLINE=1 with pre-download)

Pure gateway mode (compression disabled) needs neither.


When to Use · When to Skip

Great fit if you:

  • Run coding agents daily and want savings without rewriting your app
  • Work across multiple agents and want shared memory
  • Need reversible compression with local data residency

Skip if you:

  • Only use one provider's native compaction and never hit tool-output bloat
  • Cannot run local processes (strict sandbox with no proxy)

Headroom complements LLM wikis and OKF bundles—pre-compiled knowledge plus compressed delivery.


OpenClaw and Integrations

Headroom installs as an OpenClaw ContextEngine plugin. Integrations span LangChain, Agno, Strands, FastAPI middleware, and custom stacks—see llms.txt for machine-readable doc index.


Getting Started Checklist

  1. pip install "headroom-ai[all]"
  2. headroom wrap claude (or your agent)
  3. Run a heavy task—large grep, test output, RAG pull
  4. headroom perf — inspect savings
  5. Optional: headroom mcp install, enable headroom learn, tune CCR TTL

Summary

Headroom is the context compression layer the agent ecosystem was missing: local-first, multi-algorithm, reversible, and agent-native. 29.5K stars in roughly five months since OSS release reflects how universal the token problem is.

Install in 60 seconds. Wrap Claude Code in one command. Keep your data on your machine. Retrieve originals when the model needs them.


Related Reading

Features, benchmarks, and install paths cited from chopratejas/headroom README and docs as of June 14, 2026.

Related posts