Reduce Claude Code input token usage by 10-25% offline (zero cost) or 60-75% with API (optional). Offline by default — no API key needed. Works on Windows, Mac, Linux.
Companion to caveman (output compression). This compresses input: session memory, CLAUDE.md, auto-memory, tool outputs.
claude plugin install github:rico03/claude-context-compressor
That's it. No configuration needed. Hooks activate automatically on next session.
Requirements: Python 3.9+ (any version, auto-detected on all platforms)
Every Claude Code session re-injects your full CLAUDE.md and auto-memory. After 10+ sessions that's thousands of redundant tokens on every message. This plugin intercepts and compresses them.
| Hook | Event | What it compresses |
|---|---|---|
SessionStart | Session open | CLAUDE.md + auto-memory → compressed |
UserPromptSubmit | Each message | facts store → only relevant facts injected |
PostToolUse | After Bash/Read/Grep | large outputs → truncated |
Stop | Session end | extracts key facts → merges into facts store |
PreCompact | Before /compact | backs up critical facts |
Facts store merge logic:
current_state, preferences, bugs → latest value overrides olddecisions, architecture → accumulate forever (never lost)Optional. Create .claude/compress.config.json in your project to override defaults:
{
"level": "input-only"
}
| Level | Input | Output | Notes |
|---|---|---|---|
off | ❌ | ❌ | Disable everything |
input-only | ✅ aggressive | ❌ | Default |
balanced | ✅ light | caveman lite | Pair with caveman |
max | ✅ aggressive | caveman ultra | Maximum savings |
By default the plugin runs fully offline using heuristic compression (no API key required, zero cost). To unlock higher compression via Claude API:
{
"use_api": true,
"model": "claude-haiku-4-5-20251001"
}
API mode adds: semantic memory compression, smart fact selection per prompt, intelligent tool output summarization, and session fact extraction.
{
"input": {
"enabled": true,
"memory_compression": "aggressive",
"tool_output_limit": 200,
"min_tokens_to_compress": 100
}
}
After install, use /compress in Claude Code:
/compress status — current config + tokens saved this session/compress level max — switch preset/compress log — last 10 compression events/compress facts — view current facts storepython -c "
import json
from pathlib import Path
log = Path('.claude/token_log.jsonl')
lines = [json.loads(l) for l in log.read_text().splitlines() if l]
total = sum(l['saved'] for l in lines)
for l in lines[-5:]:
print(f' [{l[\"hook\"]:25}] {l[\"before\"]:>5}→{l[\"after\"]:>5} ({l[\"reduction_pct\"]:>5.1f}%)')
print(f'\n Total saved: {total:,} tokens')
"
claude plugin install github:JuliusBrussee/caveman
claude plugin install github:rico03/claude-context-compressor
Set "level": "max" in your config. Together: ~70% total token reduction.
MIT
Executes bash commands
Hook triggers when Bash tool is used
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub rico-0-3/claude-context-compressor --plugin context-compressorToken optimization for Claude Code. Automatic tool output compression (40-60% reduction), token meter in statusline, auto-compact at 70% context, structured output formats, self-building project wiki, URL ingestion, Karpathy compile pass, and semantic lint.
Governor: always-on compact professional output, telemetry, context slimming, tool-output filtering, prompt guidance, and drift guardrails for Claude Code Max users.
Opus 4.8-aware context optimization: silent-by-default hooks, honest NET token savings, big-file map-then-load, Context Control Center, per-task tracking, prompt coach
Open-source, local-first Claude Code plugin for token reduction, context compression, and cost optimization using hybrid RAG retrieval (BM25 + vector search), reranking, AST-aware chunking, and compact context packets.
Audit, fix, and monitor Claude Code context window usage. Find the ghost tokens.
Analyze and reduce Claude Code token overhead