Headroom is an open-source (Apache 2.0) context compression layer for AI agents by Tejas Chopra. It compresses tool outputs, logs, RAG chunks, files, and conversation history before they reach the LLM—typically 60–95% fewer tokens while preserving answer quality. It runs locally as a Python/TypeScript library, drop-in proxy, MCP server, or agent wrapper.

How do I install and use Headroom with Claude Code?

pip install "headroom-ai[all]" (Python 3.10+) or npm install headroom-ai. Then run headroom wrap claude to wrap Claude Code with compression enabled. Alternatives: headroom proxy --port 8787 for any OpenAI-compatible client, or from headroom import compress for inline library use. Check savings with headroom perf.

What is CCR reversible compression in Headroom?

CCR (Cache-Compressed Retrieval) stores original content locally after compression. The LLM receives a compressed prompt plus a headroom_retrieve tool; if it needs full detail, it fetches originals on demand within a configured TTL. This avoids the irreversible loss of hosted summarization APIs.

How much token savings does Headroom claim?

Headroom reports 60–95% savings on real agent workloads—for example code search (100 results) 17,765 to 1,408 tokens (92%), SRE incident debugging 65,694 to 5,118 (92%), and GitHub issue triage 54,174 to 14,761 (73%). Benchmarks on GSM8K, TruthfulQA, SQuAD v2, and BFCL show accuracy preserved or slightly improved at 19–32% compression on QA/tool tasks.

How does Headroom compare to OpenAI compaction or RTK?

Headroom runs locally, covers all content types (JSON, code AST, prose), works across agents and frameworks, and is reversible via CCR. OpenAI compaction is provider-native and conversation-only. RTK and lean-ctx focus on CLI/MCP context rewriting; Headroom compresses everything downstream and can integrate RTK or lean-ctx as upstream context tools.

What is headroom learn?

headroom learn mines failed agent sessions and writes corrections to CLAUDE.md, AGENTS.md, or GEMINI.md—turning compression-adjacent failures into persistent instructions so the same mistakes do not repeat. It complements cross-agent memory and the MCP headroom_stats tooling.

Headroom AI: Context Compression for Agents Guide | explainx.ai Blog

Headroom by Tejas Chopra is one of the fastest-growing open-source tools in agent infrastructure—29.5K+ GitHub stars, 2K forks, and 155 releases as of June 2026 (latest: v0.25.0). It is not another LLM wrapper. It is a local-first context compression layer that sits between your agent and the provider:

Compress everything your AI agent reads—tool outputs, logs, RAG chunks, files, conversation history—before it reaches the LLM. Same answers, fraction of the tokens.

Live demo from the README: 10,144 → 1,260 tokens—same FATAL found in a log search.

If Karpathy's LLM Wiki solves what knowledge to compile, Headroom solves how much of that knowledge fits in the window. They stack: a maintained wiki reduces re-retrieval; Headroom shrinks what still ships to the model.

TL;DR

Question	Answer
Repo	github.com/chopratejas/headroom
License	Apache 2.0
Install	`pip install "headroom-ai[all]"` or `npm install headroom-ai`
Quick start	`headroom wrap claude`
Modes	Library · proxy · MCP · agent wrap
Savings	60–95% on real workloads (vendor benchmarks)
Reversible	Yes — CCR caches originals locally
Model	Kompress-v2-base (HuggingFace)
Docs	headroom-docs.vercel.app

The Problem: Context Is the Bottleneck

Coding agents burn tokens on:

Source	Why it hurts
Tool outputs	`grep`, test logs, API JSON—verbose by default
RAG chunks	Retrieved docs repeat across turns
File reads	Whole files when a summary would suffice
Conversation history	Long sessions fill the window before the task finishes

Provider-native compaction (OpenAI, Anthropic /compact) helps conversation but not arbitrary tool payloads. Hosted compression APIs send your data off-machine and often destroy reversibility.

Headroom's pitch: compress at the boundary, locally, with retrieval on demand.

Architecture (30 Seconds)

snippet

 Your agent (Claude Code, Cursor, Codex, LangChain, …)
        │  prompts · tool outputs · logs · RAG · files
        ▼
 ┌──────────────────────────────────────────────┐
 │  Headroom  (local — data stays on your machine) │
 │  CacheAligner → ContentRouter → CCR           │
 │    ├─ SmartCrusher    (JSON)                  │
 │    ├─ CodeCompressor (AST / tree-sitter)      │
 │    └─ Kompress-base   (prose, HuggingFace)    │
 │  Cross-agent memory · headroom learn · MCP    │
 └──────────────────────────────────────────────┘
        │  compressed prompt + retrieval tool
        ▼
 LLM provider (Anthropic · OpenAI · Bedrock · …)

Component	Role
ContentRouter	Detects content type, picks compressor
SmartCrusher	Structured JSON compression
CodeCompressor	AST-aware code shrinking
Kompress-base	Neural text compression (Kompress-v2-base)
CacheAligner	Stabilizes prefixes for provider KV cache hits
CCR	Stores originals; `headroom_retrieve` fetches full text

The codebase is 78% Python, 17% Rust (performance core), plus TypeScript SDK—serious engineering, not a thin wrapper.

Four Ways to Run It

1. Agent wrap (fastest for coding agents)

bash

pip install "headroom-ai[all]"
headroom wrap claude          # Claude Code
headroom wrap codex           # shares memory with Claude
headroom wrap cursor          # prints config — paste once
headroom wrap aider
headroom wrap copilot

Flags like --memory and --code-graph extend Claude Code integration per the agent compatibility matrix.

2. Drop-in proxy (zero code changes)

bash

headroom proxy --port 8787

Point any OpenAI-compatible client at localhost:8787. Works for custom apps, CI, or languages without a native SDK.

3. Library (inline)

python

from headroom import compress

compressed = compress(messages)

TypeScript: npm install headroom-ai.

4. MCP server

Tools exposed to any MCP client (Model Context Protocol):

Tool	Purpose
`headroom_compress`	Compress arbitrary context
`headroom_retrieve`	Fetch CCR-cached originals
`headroom_stats`	Token savings telemetry

Install: headroom mcp install.

Token Savings: Real Workloads

From Headroom's published benchmarks on agent-shaped tasks:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy on standard evals (reproduce with python -m headroom.evals suite --tier 1):

Benchmark	Baseline	Headroom	Notes
GSM8K (math)	0.870	0.870	No delta
TruthfulQA	0.530	0.560	+0.030
SQuAD v2	—	97% acc	~19% compression
BFCL (tools)	—	97% acc	~32% compression

Measure your own runs: headroom perf.

CCR: Why Reversible Matters

Irreversible summarization fails when the model needs line 847 of the stack trace or one field in a 200-row JSON response. CCR pattern:

Compress for the initial prompt
Cache full originals locally (TTL-configurable)
Expose headroom_retrieve so the model pulls detail only when needed

This is the difference between "cheaper but blind" and "cheaper but auditable."

Cross-Agent Memory and headroom learn

Cross-agent memory — shared store across Claude, Codex, Gemini with auto-dedup. Stop re-explaining architecture every time you switch tools.

headroom learn — mines failed sessions, writes corrections to CLAUDE.md, AGENTS.md, or GEMINI.md. Compression reduces tokens; learning reduces repeated mistakes—orthogonal wins.

GitHub Copilot CLI Subscription Mode

Headroom can proxy Copilot CLI subscription traffic:

bash

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

Headroom exchanges its GitHub OAuth token for Copilot's short-lived API token and sets COPILOT_PROVIDER_API_URL for the wrapper. Enterprise Server: set GITHUB_COPILOT_ENTERPRISE_DOMAIN. For Docker/CI, pass explicit GITHUB_COPILOT_TOKEN rather than relying on host keychain.

Agent Compatibility

Agent	`headroom wrap`	Notes
Claude Code	✅	`--memory`, `--code-graph`
Codex	✅	Shared memory with Claude
Cursor	✅	Config snippet to paste
Aider	✅	Starts proxy + launches
Copilot CLI	✅	Subscription mode supported
OpenClaw	✅	ContextEngine plugin

Any OpenAI-compatible client works via proxy. See Claude Code commands for /context and /compact alongside external compression.

Headroom vs Alternatives

Tool	Scope	Local	Reversible
Headroom	All context types	✅	✅ (CCR)
RTK	CLI command outputs	✅	❌
lean-ctx	CLI, MCP, editor rules	✅	❌
Compresr / Token Co.	Text via hosted API	❌	❌
OpenAI compaction	Conversation only	Provider	❌

Headroom ships RTK for shell-output rewriting and can use lean-ctx via HEADROOM_CONTEXT_TOOL=lean-ctx—compress downstream of whichever CLI context tool you prefer.

Installation Details

bash

# Python (everything)
pip install "headroom-ai[all]"

# Node / TypeScript
npm install headroom-ai

# Docker
docker pull ghcr.io/chopratejas/headroom:latest

Requires Python 3.10+. Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [langchain], [agno], [evals], [pytorch-mps] (Apple GPU embedder offload).

pipx: pipx install --python python3.13 "headroom-ai[all]"

Corporate SSL inspection

If pip install fails with CERTIFICATE_VERIFY_FAILED, install Rust first (maturin downloads rustup over TLS), or use prebuilt wheels: pip install --only-binary headroom-ai headroom-ai.

Runtime assets fetched over TLS:

cdn.pyke.io — ONNX Runtime (or ORT_STRATEGY=system)
huggingface.co — kompress-base model (or HF_HUB_OFFLINE=1 with pre-download)

Pure gateway mode (compression disabled) needs neither.

When to Use · When to Skip

Great fit if you:

Run coding agents daily and want savings without rewriting your app
Work across multiple agents and want shared memory
Need reversible compression with local data residency

Skip if you:

Only use one provider's native compaction and never hit tool-output bloat
Cannot run local processes (strict sandbox with no proxy)

Headroom complements LLM wikis and OKF bundles—pre-compiled knowledge plus compressed delivery.

OpenClaw and Integrations

Headroom installs as an OpenClaw ContextEngine plugin. Integrations span LangChain, Agno, Strands, FastAPI middleware, and custom stacks—see llms.txt for machine-readable doc index.

Getting Started Checklist

pip install "headroom-ai[all]"
headroom wrap claude (or your agent)
Run a heavy task—large grep, test output, RAG pull
headroom perf — inspect savings
Optional: headroom mcp install, enable headroom learn, tune CCR TTL

Summary

Headroom is the context compression layer the agent ecosystem was missing: local-first, multi-algorithm, reversible, and agent-native. 29.5K stars in roughly five months since OSS release reflects how universal the token problem is.

Install in 60 seconds. Wrap Claude Code in one command. Keep your data on your machine. Retrieve originals when the model needs them.

Features, benchmarks, and install paths cited from chopratejas/headroom README and docs as of June 14, 2026.

Compress everything your AI agent reads—tool outputs, logs, RAG chunks, files, conversation history—before it reaches the LLM. Same answers, fraction of the tokens.

Live demo from the README: 10,144 → 1,260 tokens—same FATAL found in a log search.

TL;DR

Question	Answer
Repo	github.com/chopratejas/headroom
License	Apache 2.0
Install	`pip install "headroom-ai[all]"` or `npm install headroom-ai`
Quick start	`headroom wrap claude`
Modes	Library · proxy · MCP · agent wrap
Savings	60–95% on real workloads (vendor benchmarks)
Reversible	Yes — CCR caches originals locally
Model	Kompress-v2-base (HuggingFace)
Docs	headroom-docs.vercel.app

The Problem: Context Is the Bottleneck

Coding agents burn tokens on:

Source	Why it hurts
Tool outputs	`grep`, test logs, API JSON—verbose by default
RAG chunks	Retrieved docs repeat across turns
File reads	Whole files when a summary would suffice
Conversation history	Long sessions fill the window before the task finishes

Headroom's pitch: compress at the boundary, locally, with retrieval on demand.

Architecture (30 Seconds)

snippet

 Your agent (Claude Code, Cursor, Codex, LangChain, …)
        │  prompts · tool outputs · logs · RAG · files
        ▼
 ┌──────────────────────────────────────────────┐
 │  Headroom  (local — data stays on your machine) │
 │  CacheAligner → ContentRouter → CCR           │
 │    ├─ SmartCrusher    (JSON)                  │
 │    ├─ CodeCompressor (AST / tree-sitter)      │
 │    └─ Kompress-base   (prose, HuggingFace)    │
 │  Cross-agent memory · headroom learn · MCP    │
 └──────────────────────────────────────────────┘
        │  compressed prompt + retrieval tool
        ▼
 LLM provider (Anthropic · OpenAI · Bedrock · …)

Component	Role
ContentRouter	Detects content type, picks compressor
SmartCrusher	Structured JSON compression
CodeCompressor	AST-aware code shrinking
Kompress-base	Neural text compression (Kompress-v2-base)
CacheAligner	Stabilizes prefixes for provider KV cache hits
CCR	Stores originals; `headroom_retrieve` fetches full text

The codebase is 78% Python, 17% Rust (performance core), plus TypeScript SDK—serious engineering, not a thin wrapper.

Four Ways to Run It

1. Agent wrap (fastest for coding agents)

bash

pip install "headroom-ai[all]"
headroom wrap claude          # Claude Code
headroom wrap codex           # shares memory with Claude
headroom wrap cursor          # prints config — paste once
headroom wrap aider
headroom wrap copilot

Flags like --memory and --code-graph extend Claude Code integration per the agent compatibility matrix.

2. Drop-in proxy (zero code changes)

bash

headroom proxy --port 8787

Point any OpenAI-compatible client at localhost:8787. Works for custom apps, CI, or languages without a native SDK.

3. Library (inline)

python

from headroom import compress

compressed = compress(messages)

TypeScript: npm install headroom-ai.

4. MCP server

Tools exposed to any MCP client (Model Context Protocol):

Tool	Purpose
`headroom_compress`	Compress arbitrary context
`headroom_retrieve`	Fetch CCR-cached originals
`headroom_stats`	Token savings telemetry

Install: headroom mcp install.

Token Savings: Real Workloads

From Headroom's published benchmarks on agent-shaped tasks:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy on standard evals (reproduce with python -m headroom.evals suite --tier 1):

Benchmark	Baseline	Headroom	Notes
GSM8K (math)	0.870	0.870	No delta
TruthfulQA	0.530	0.560	+0.030
SQuAD v2	—	97% acc	~19% compression
BFCL (tools)	—	97% acc	~32% compression

Measure your own runs: headroom perf.

CCR: Why Reversible Matters

Irreversible summarization fails when the model needs line 847 of the stack trace or one field in a 200-row JSON response. CCR pattern:

Compress for the initial prompt
Cache full originals locally (TTL-configurable)
Expose headroom_retrieve so the model pulls detail only when needed

This is the difference between "cheaper but blind" and "cheaper but auditable."

Cross-Agent Memory and headroom learn

Cross-agent memory — shared store across Claude, Codex, Gemini with auto-dedup. Stop re-explaining architecture every time you switch tools.

headroom learn — mines failed sessions, writes corrections to CLAUDE.md, AGENTS.md, or GEMINI.md. Compression reduces tokens; learning reduces repeated mistakes—orthogonal wins.

GitHub Copilot CLI Subscription Mode

Headroom can proxy Copilot CLI subscription traffic:

bash

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

Agent Compatibility

Agent	`headroom wrap`	Notes
Claude Code	✅	`--memory`, `--code-graph`
Codex	✅	Shared memory with Claude
Cursor	✅	Config snippet to paste
Aider	✅	Starts proxy + launches
Copilot CLI	✅	Subscription mode supported
OpenClaw	✅	ContextEngine plugin

Any OpenAI-compatible client works via proxy. See Claude Code commands for /context and /compact alongside external compression.

Headroom vs Alternatives

Tool	Scope	Local	Reversible
Headroom	All context types	✅	✅ (CCR)
RTK	CLI command outputs	✅	❌
lean-ctx	CLI, MCP, editor rules	✅	❌
Compresr / Token Co.	Text via hosted API	❌	❌
OpenAI compaction	Conversation only	Provider	❌

Headroom ships RTK for shell-output rewriting and can use lean-ctx via HEADROOM_CONTEXT_TOOL=lean-ctx—compress downstream of whichever CLI context tool you prefer.

Installation Details

bash

# Python (everything)
pip install "headroom-ai[all]"

# Node / TypeScript
npm install headroom-ai

# Docker
docker pull ghcr.io/chopratejas/headroom:latest

Requires Python 3.10+. Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [langchain], [agno], [evals], [pytorch-mps] (Apple GPU embedder offload).

pipx: pipx install --python python3.13 "headroom-ai[all]"

Corporate SSL inspection

If pip install fails with CERTIFICATE_VERIFY_FAILED, install Rust first (maturin downloads rustup over TLS), or use prebuilt wheels: pip install --only-binary headroom-ai headroom-ai.

Runtime assets fetched over TLS:

cdn.pyke.io — ONNX Runtime (or ORT_STRATEGY=system)
huggingface.co — kompress-base model (or HF_HUB_OFFLINE=1 with pre-download)

Pure gateway mode (compression disabled) needs neither.

When to Use · When to Skip

Great fit if you:

Run coding agents daily and want savings without rewriting your app
Work across multiple agents and want shared memory
Need reversible compression with local data residency

Skip if you:

Only use one provider's native compaction and never hit tool-output bloat
Cannot run local processes (strict sandbox with no proxy)

Headroom complements LLM wikis and OKF bundles—pre-compiled knowledge plus compressed delivery.

OpenClaw and Integrations

Headroom installs as an OpenClaw ContextEngine plugin. Integrations span LangChain, Agno, Strands, FastAPI middleware, and custom stacks—see llms.txt for machine-readable doc index.

Getting Started Checklist

pip install "headroom-ai[all]"
headroom wrap claude (or your agent)
Run a heavy task—large grep, test output, RAG pull
headroom perf — inspect savings
Optional: headroom mcp install, enable headroom learn, tune CCR TTL

Summary

Install in 60 seconds. Wrap Claude Code in one command. Keep your data on your machine. Retrieve originals when the model needs them.

Features, benchmarks, and install paths cited from chopratejas/headroom README and docs as of June 14, 2026.

TL;DR

The Problem: Context Is the Bottleneck

Architecture (30 Seconds)

Four Ways to Run It

1. Agent wrap (fastest for coding agents)

2. Drop-in proxy (zero code changes)

3. Library (inline)

4. MCP server

Token Savings: Real Workloads

CCR: Why Reversible Matters

Cross-Agent Memory and headroom learn

GitHub Copilot CLI Subscription Mode

Agent Compatibility

Headroom vs Alternatives

Installation Details

Corporate SSL inspection

When to Use · When to Skip

OpenClaw and Integrations

Getting Started Checklist

Summary

Related Reading

TL;DR

The Problem: Context Is the Bottleneck

Architecture (30 Seconds)

Four Ways to Run It

1. Agent wrap (fastest for coding agents)

2. Drop-in proxy (zero code changes)

3. Library (inline)

4. MCP server

Token Savings: Real Workloads

CCR: Why Reversible Matters

Cross-Agent Memory and headroom learn

GitHub Copilot CLI Subscription Mode

Agent Compatibility

Headroom vs Alternatives

Installation Details

Corporate SSL inspection

When to Use · When to Skip

OpenClaw and Integrations

Getting Started Checklist

Summary

Related Reading

Related posts

CLAUDE.md vs SKILL.md vs MCP: The Modern Agent Stack Explained

Build Your First MCP Server: A Step-by-Step Guide (2026)

Unreal Engine 5.8 AI Integration: Claude, Codex, and MCP Editor Control

Related posts

CLAUDE.md vs SKILL.md vs MCP: The Modern Agent Stack Explained

Build Your First MCP Server: A Step-by-Step Guide (2026)

Unreal Engine 5.8 AI Integration: Claude, Codex, and MCP Editor Control