From token-reduce
Reduce repo context cost with QMD/helper discovery, scoped rg, targeted reads, concise summaries, and AI-delegate call batching.
How this skill is triggered — by the user, by Claude, or both
Slash command
/token-reduce:token-reduceThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use targeted retrieval and short summaries when paths are unknown, the repo is large, or the task spans multiple files. Skip exact-path tiny edits.
Use targeted retrieval and short summaries when paths are unknown, the repo is large, or the task spans multiple files. Skip exact-path tiny edits.
scripts/token-reduce-paths.sh topic wordsscripts/token-reduce-snippet.sh topic wordstoken-savior is installed, you may use:
uv run python scripts/token-reduce-structural.py --project-root . find-symbol ExactSymboluv run python scripts/token-reduce-structural.py --project-root . change-impact ExactSymbolscripts/token-reduce-paths.sh for the initial path-only kickoff.scripts/token-reduce-snippet.sh only when the path list is not enough.qmd search or raw rg as the first compliant move when the helper is available; those belong inside the helper workflow or as a narrow follow-up after helper output.||, &&, find, ls, or extra fallback shell logic.rg --files . as compliant discovery.find ., ls -R, grep -R, or broad Glob patterns such as **/*.| Strategy | Measured Savings | When |
|---|---|---|
| Concise responses | 89% | Always |
| QMD BM25 search | 71–83% vs broad file listing (local/composite benchmarks); much higher vs reading file contents naively | Finding which files to read |
| Targeted reads | 33% | Large files |
| Parallel calls | 20% | Independent lookups |
| Caveman-style output profile (optional companion) | 20–65% output token reduction in upstream caveman benchmarks | When the user explicitly asks for extra brevity |
| AXI companion tools (optional) | Fewer turns in upstream AXI studies for GitHub/browser tasks | When work is primarily GitHub or browser automation |
AI delegate router (delegate-skill) | Offload bounded side work while parent agent keeps critical-path orchestration and verification | Let the router pick the delegate: devin (browser/sandbox), kimi (cheap research/review), grok (large codebase), spark (local Codex write-mode) |
| Adaptive tier router | Auto-promotes/demotes helper tier from behavior and query intent | Default first move when path is unknown (token-reduce-adaptive) |
| Context Mode companion (optional) | Up to ~98% reduction in output-heavy fixture comparisons | When tasks are dominated by huge tool payloads (logs, test output, API dumps) |
| Headroom companion (optional pilot) | 24-33% saved in local tool-result smoke tests; live proxy/MCP can reduce long-session tool context | When large tool results or old turns keep inflating the context and a verified Headroom proxy is already available |
| code-review-graph companion (optional) | 6x–10x token wins on larger-repo token-efficiency samples; can lose on tiny single-file diffs | Large monorepo review, dependency blast-radius, architecture impact tasks |
command -v qmd >/dev/null 2>&1 && qmd collection list 2>/dev/null | head -1
If unavailable, use scoped rg.scripts/token-reduce-adaptive.sh topic words.scripts/token-reduce-paths.sh topic words.scripts/token-reduce-snippet.sh topic words.gh-axi or chrome-devtools-axi over higher-overhead interfaces when available.minimal-load, balanced, max-savings) via token-reduce-manage.sh settings profile apply <name>.Token-reduce remains the master router. Use helper-first discovery, scoped reads, QMD, RTK, and structural helpers before adding a proxy layer.
Use Headroom only when headroom --version works, headroom install status or /readyz shows a healthy local proxy, telemetry is disabled, and the task has large tool payloads, repeated log/API/test outputs, or long-session context pressure.
Do not use Headroom as the first move for unknown-path repo discovery. Do not enable --learn until memory writes are reviewed against MEMORY.md, daily memory, and gbrain policy. If the OpenClaw installer emits obsolete plugin keys such as startupTimeoutMs or gatewayProviderIds, keep the manually verified config and do not rerun headroom install apply --providers all.
Preferred checks:
headroom install status
curl -fsS http://127.0.0.1:8787/readyz
Read references/headroom-evaluation-2026-06-10.md for evidence and rollback caveats.
When the user asks for tighter responses, apply a caveman-inspired lite profile:
Do not force this style when clarity or safety would degrade. This is optional, not the default for every user.
rg, not recursive shell scans.scripts/token-reduce-search.sh uses repo-scoped QMD first, then scoped rg.rg --files . and similar broad inventory commands are treated as violations.scripts/token-reduce-paths.sh topic words
scripts/token-reduce-snippet.sh topic words
scripts/token-reduce-adaptive.sh topic words
If helpers are unavailable, use qmd search "topic" -n 5 --files or narrowly scoped rg -n -g '<glob>' '<pattern>'.
Semantic QMD is optional, not the default token-reduce path. For cheap discovery, stay on BM25 (qmd search) unless the user explicitly asks for QMD semantic setup or BM25 misses conceptual matches. When semantic QMD is requested, run qmd embed for the relevant collection, verify with qmd status, then smoke-test qmd vsearch/qmd query. QMD embeddings are local GGUF models via node-llama-cpp (QMD reports the active model, e.g. embeddinggemma-300M), not Ollama; GBrain's Ollama embeddings are a separate vector store. On CPU-only hosts, embedding can be slow, so prefer scoped batches such as qmd embed -c <collection> --max-docs-per-batch 32 --max-batch-mb 8 and report progress. If QMD reports Session expired, Bun segfaults, or a command times out, it may still commit partial vectors; rerun with smaller batches and verify with qmd status plus a qmd vsearch/qmd query smoke test before claiming coverage.
Use GBrain for durable project memory and cross-session decisions; use QMD/token-reduce for current-repo discovery. Do not assume the two systems share vectors: GBrain may use Ollama embeddings, while QMD uses its own local GGUF embedding index.
Never start discovery with find ., ls -R, grep -R, rg --files ., broad Glob, or chained fallback shell logic.
command -v qmd >/dev/null 2>&1 && qmd collection list 2>/dev/null | head -1.scripts/token-reduce-adaptive.sh topic words.scripts/token-reduce-paths.sh topic words.scripts/token-reduce-snippet.sh topic words.uv run python scripts/token-reduce-structural.py --project-root . find-symbol ExactSymbol.rg.gh-axi, chrome-devtools-axi, graph/review tools) are used only when installed and clearly cheaper.See references/feature-matrix.md for full command/config details.
Delegate selection goes through the delegate-skill router — do not hand-pick a delegate or hardcode one wrapper. The router maps the task to the right backend and keeps the parent context small (only the result summary returns):
| Task | Router picks | Wrapper |
|---|---|---|
| Browser / UI / screenshot / sandbox | devin | devin-delegate |
| Cheap research / review / summarize / draft | kimi | kimi-delegate |
| Multi-file refactor / large codebase | grok | grok-delegate |
| Local Codex write-mode implementation | spark | /spark |
| Unknown scope | kimi to scope, then escalate | kimi-delegate |
For owned workspace projects, default to the PR-backed delegation workflow:
delegate-skill router, not by hand. See delegate-skill/SKILL.md for the full routing table and health checks.devin-delegate, kimi-delegate, grok-delegate, /spark) — never raw devin, raw pi --provider kimi-coding, or backgrounded delegate commands. Wrappers preserve envelope checks, fallback, and telemetry.The call-reduction tips below apply to whichever wrapper the router selects (written here as <delegate>-delegate). Orchestrator-to-subagent calls have fixed overhead (envelope, fallback wiring, telemetry). Reduce by:
# BAD: 5 calls × overhead
<delegate>-delegate --task "Check zero-value guard in submit()"
<delegate>-delegate --task "Check oracle replay protection"
...
# GOOD: 1 call, 5 questions, ~70% token savings
<delegate>-delegate --task "Answer CLEAN or FINDING+file:line for each:
Q1. StakingRouter.submit(): zero-value ETH guard?
Q2. reportModuleBeaconBalance: replay protection?
Q3-Q5. ..."
# BAD (pastes 50 lines into prompt)
<delegate>-delegate --task "Review this: [code block]"
# GOOD (the delegate reads it itself — 30-70% cheaper)
<delegate>-delegate --task "Read OracleAdapter.sol:120-135. Does _validateSlashGuard
enforce a floor? CLEAN or FINDING."
Append to every task: "Answer CLEAN or FINDING+file:line. No preamble." — cuts response tokens 40-60%.
./scripts/token-reduce-paths.sh "staking contracts" > /tmp/ctx.txt
<delegate>-delegate --task "..." --context-file /tmp/ctx.txt
& — use Agent tool for parallelism<delegate>-delegate ... 2>&1 & writes to terminal FD, not the task output file.
Use Agent(description=..., prompt="Use <delegate>-delegate ...") instead.
# --print-envelope emits the structured plan; no per-skill script paths needed
<delegate>-delegate --print-envelope --task "audit X" > /tmp/envelope.txt
<delegate>-delegate --context-file /tmp/envelope.txt --task "execute the plan above"
references/meta-learnings-2026-05-31.mdRead references/token-reduction-guide.md for benchmark notes and integration details.
Read references/delegate-skill-integration.md for how token-reduce integrates the delegate-skill router.
Read references/companion-tools.md for how to evaluate future companion backends.
Read references/graphify-evaluation.md for the graphify companion verdict.
Read references/caveman-evaluation.md for the caveman companion verdict.
Read references/headroom-evaluation-2026-06-10.md for the Headroom proxy/MCP pilot verdict.
Read references/axi-evaluation.md for the AXI companion verdict.
Read references/prompt-stack-intake-2026-04-18.md for the 10-dependency prompt-stack intake verdict and evidence.
Read references/feature-matrix.md for the complete feature/command/config/telemetry map.
Read references/meta-learnings-2026-04-18.md for validated integration lessons and guardrails.
Read references/meta-learnings-2026-04-19.md for QMD indexing/routing synchronization lessons and latency/adoption follow-ups.
Read references/meta-learnings-2026-04-25.md for telemetry-window interpretation and diagnostics normalization lessons.
Read references/meta-learnings-2026-05-06.md for telemetry-driven instrumentation and propagation workflow lessons.
Read references/meta-learnings-2026-05-20.md for docs fast-path routing and weekly maintenance automation lessons.
Read references/tier-value-profile.md for keep/conditional/excluded dependency-tier decisions.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub chimera-defi/token-reduce-skill --plugin token-reduce