From claude-token-reducer
Reduces token usage by chunking large code/docs into FTS5-indexed and embedded pieces, retrieving/reranking top chunks via BM25/vectors, and summarizing into compact citation-rich packets.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-token-reducer:token-reducerThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Cut context size without cutting answer quality.
Cut context size without cutting answer quality.
Claude Code often answers code questions with native Read / Grep on whole files, which loads raw text into the model and bypasses this pipeline. Long chat history is re-sent every turn, so costs compound even when code is compressed.
Do not paste large code or logs into chat — that bypasses reduction and burns tokens.
Run the slash command first so the pipeline runs before reasoning, for example: use /token-reducer with a short objective and paths (defaults come from plugin settings.json: small chunks, low --top-k, word budget, relevanceFloor).
CLI (same pipeline) when you want a packet on disk or in a script:
python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" run --inputs ./src --query "Locate JWT validation" --top-k 3
Use a specific query (not “auth stuff”) so low-scoring chunks are dropped by the relevance floor before summarization.
Session hygiene: around 10 turns the hook suggests /compact; by 40–50 turns start a new chat for coding after planning.
settings.json → chunkSizeWords / chunkOverlapWords).defaultTopK in settings.json).relevanceFloor.End-to-end run (defaults from plugin settings.json; override flags as needed):
python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" run --inputs . --query "${ARGUMENTS}" --hybrid-mode fallback --top-k 3
Self-test:
python "${CLAUDE_PLUGIN_ROOT}/scripts/context_pipeline.py" self-test
settings.json under tokenReducer)compressionWordBudget — lower for shorter summaries (e.g. 150).chunkSizeWords / chunkOverlapWords — smaller chunks before compression (e.g. 100 / 20).defaultTopK — fewer final chunks (e.g. 3).relevanceFloor — higher values drop more weak chunks before summarization (e.g. 0.18).Session reminders: top-level promptGuard (autoCompactTurn, autoResetTurn, criticalResetTurn, reminderTurns) plus historyCompactReminderTurns inside tokenReducer.
./references/implementation-guide.md./references/context7-integration.mdnpx claudepluginhub madhan230205/token-reducer --plugin claude-token-reducerApplies token optimization rules to reduce context usage and response length. Always active, it minimizes file reads, avoids preamble, and batches tool calls.
Guides managing Claude Code context window with /compact, /clear commands, auto-compaction config, sub-agents, targeted reads, background tasks, and conversation flows for long sessions.
Teaches the four operations of context engineering — Write, Select, Compress, Isolate — for managing token budgets, compaction strategies, and context partitioning to keep AI sessions sharp and efficient.