Skill

token-reduce

Reduce repo context cost with QMD/helper discovery, scoped rg, targeted reads, concise summaries, and AI-delegate call batching.

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/token-reduce:token-reduce

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGrepGlobBash

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use targeted retrieval and short summaries when paths are unknown, the repo is large, or the task spans multiple files. Skip exact-path tiny edits.

SKILL.md

250 lines · ~3.7k tokens

Stats

LanguagePython

Stars2

Forks1

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Token Reduction Skill

Description

Use targeted retrieval and short summaries when paths are unknown, the repo is large, or the task spans multiple files. Skip exact-path tiny edits.

Triggers

The user asks to review, explore, search for context, or find where something lives.
The user asks to validate, check, verify, improve, fix, or update a skill, hook, script, or file.
The user's request implies the skill is not being used correctly or needs to work better.
You do not know the exact file path yet.
The task spans several files or areas of the repo.
Broad scans or full-file reads would likely waste context.
When maintaining this skill itself, the same narrow-discovery rules apply.

First Move

If file location is unknown, start with one standalone discovery command:
- scripts/token-reduce-paths.sh topic words
- scripts/token-reduce-snippet.sh topic words
If the exact symbol is already known and token-savior is installed, you may use:
- uv run python scripts/token-reduce-structural.py --project-root . find-symbol ExactSymbol
- uv run python scripts/token-reduce-structural.py --project-root . change-impact ExactSymbol
Prefer scripts/token-reduce-paths.sh for the initial path-only kickoff.
Prefer the structural helper only for exact symbol or dependency questions; do not use it as the default for vague repo discovery.
Use scripts/token-reduce-snippet.sh only when the path list is not enough.
Do not treat raw qmd search or raw rg as the first compliant move when the helper is available; those belong inside the helper workflow or as a narrow follow-up after helper output.
Do not chain discovery commands with ||, &&, find, ls, or extra fallback shell logic.
Do not treat rg --files . as compliant discovery.
Do not start with find ., ls -R, grep -R, or broad Glob patterns such as **/*.
After two failed discovery attempts or once the candidate set exceeds 5 files, stop expanding and ask the user to narrow the scope.

Heuristics

Strategy	Measured Savings	When
Concise responses	89%	Always
QMD BM25 search	71–83% vs broad file listing (local/composite benchmarks); much higher vs reading file contents naively	Finding which files to read
Targeted reads	33%	Large files
Parallel calls	20%	Independent lookups
Caveman-style output profile (optional companion)	20–65% output token reduction in upstream caveman benchmarks	When the user explicitly asks for extra brevity
AXI companion tools (optional)	Fewer turns in upstream AXI studies for GitHub/browser tasks	When work is primarily GitHub or browser automation
AI delegate router (`delegate-skill`)	Offload bounded side work while parent agent keeps critical-path orchestration and verification	Let the router pick the delegate: devin (browser/sandbox), kimi (cheap research/review), grok (large codebase), spark (local Codex write-mode)
Adaptive tier router	Auto-promotes/demotes helper tier from behavior and query intent	Default first move when path is unknown (`token-reduce-adaptive`)
Context Mode companion (optional)	Up to ~98% reduction in output-heavy fixture comparisons	When tasks are dominated by huge tool payloads (logs, test output, API dumps)
Headroom companion (optional pilot)	24-33% saved in local tool-result smoke tests; live proxy/MCP can reduce long-session tool context	When large tool results or old turns keep inflating the context and a verified Headroom proxy is already available
code-review-graph companion (optional)	6x–10x token wins on larger-repo token-efficiency samples; can lose on tiny single-file diffs	Large monorepo review, dependency blast-radius, architecture impact tasks

Process

Check QMD once per session:

command -v qmd >/dev/null 2>&1 && qmd collection list 2>/dev/null | head -1

If unavailable, use scoped rg.

If you know the file or keyword, use a scoped grep first, then read only the needed lines.
If you need an auto-routed kickoff, use scripts/token-reduce-adaptive.sh topic words.
If you need a low-token path-only kickoff, use scripts/token-reduce-paths.sh topic words.
If you need one ranked excerpt after the kickoff, use scripts/token-reduce-snippet.sh topic words.
If a file is large, read only the relevant section.
If the search space stays broad, stop expanding and ask the user to narrow it.
For GitHub/browser-heavy execution, prefer gh-axi or chrome-devtools-axi over higher-overhead interfaces when available.
When routing behavior should be formally constrained, apply a profile (minimal-load, balanced, max-savings) via token-reduce-manage.sh settings profile apply <name>.

Headroom Companion (Pilot)

Token-reduce remains the master router. Use helper-first discovery, scoped reads, QMD, RTK, and structural helpers before adding a proxy layer.

Use Headroom only when headroom --version works, headroom install status or /readyz shows a healthy local proxy, telemetry is disabled, and the task has large tool payloads, repeated log/API/test outputs, or long-session context pressure.

Do not use Headroom as the first move for unknown-path repo discovery. Do not enable --learn until memory writes are reviewed against MEMORY.md, daily memory, and gbrain policy. If the OpenClaw installer emits obsolete plugin keys such as startupTimeoutMs or gatewayProviderIds, keep the manually verified config and do not rerun headroom install apply --providers all.

Preferred checks:

headroom install status
curl -fsS http://127.0.0.1:8787/readyz

Read references/headroom-evaluation-2026-06-10.md for evidence and rollback caveats.

Output Brevity Profile (Companion)

When the user asks for tighter responses, apply a caveman-inspired lite profile:

remove pleasantries, hedging, and filler
keep technical terms exact
keep code blocks, commands, and error text exact
prefer short, direct statements over narrative framing

Do not force this style when clarity or safety would degrade. This is optional, not the default for every user.

Success Criteria

Discovery starts with QMD BM25 or scoped rg, not recursive shell scans.
scripts/token-reduce-search.sh uses repo-scoped QMD first, then scoped rg.
rg --files . and similar broad inventory commands are treated as violations.
Reads stay targeted.
Final summaries cite only the minimum files needed.
Repo-level instructions and hooks point at the same first-move workflow.
Owned-workspace changes that are more than trivial end on a feature branch with a PR for review and backup.

QMD

scripts/token-reduce-paths.sh topic words
scripts/token-reduce-snippet.sh topic words
scripts/token-reduce-adaptive.sh topic words

If helpers are unavailable, use qmd search "topic" -n 5 --files or narrowly scoped rg -n -g '<glob>' '<pattern>'.

Semantic QMD is optional, not the default token-reduce path. For cheap discovery, stay on BM25 (qmd search) unless the user explicitly asks for QMD semantic setup or BM25 misses conceptual matches. When semantic QMD is requested, run qmd embed for the relevant collection, verify with qmd status, then smoke-test qmd vsearch/qmd query. QMD embeddings are local GGUF models via node-llama-cpp (QMD reports the active model, e.g. embeddinggemma-300M), not Ollama; GBrain's Ollama embeddings are a separate vector store. On CPU-only hosts, embedding can be slow, so prefer scoped batches such as qmd embed -c <collection> --max-docs-per-batch 32 --max-batch-mb 8 and report progress. If QMD reports Session expired, Bun segfaults, or a command times out, it may still commit partial vectors; rerun with smaller batches and verify with qmd status plus a qmd vsearch/qmd query smoke test before claiming coverage.

Use GBrain for durable project memory and cross-session decisions; use QMD/token-reduce for current-repo discovery. Do not assume the two systems share vectors: GBrain may use Ollama embeddings, while QMD uses its own local GGUF embedding index.

Never start discovery with find ., ls -R, grep -R, rg --files ., broad Glob, or chained fallback shell logic.

Flow

Check QMD once: command -v qmd >/dev/null 2>&1 && qmd collection list 2>/dev/null | head -1.
Known keyword/path: scoped search, then read only needed ranges.
Need an auto-routed kickoff: scripts/token-reduce-adaptive.sh topic words.
Unknown path: scripts/token-reduce-paths.sh topic words.
Need one excerpt: scripts/token-reduce-snippet.sh topic words.
Exact symbol impact: uv run python scripts/token-reduce-structural.py --project-root . find-symbol ExactSymbol.
More than five likely files or two failed searches: stop expanding and ask for narrower scope.
Final response: cite only the files needed to explain the result.

Success

Discovery starts with helper/QMD/scoped rg.
Large files are read in slices.
Output is concise unless the user asks for depth.
Optional companions (gh-axi, chrome-devtools-axi, graph/review tools) are used only when installed and clearly cheaper.

See references/feature-matrix.md for full command/config details.

AI Delegate Call Reduction (via the delegate-skill router)

Delegate selection goes through the delegate-skill router — do not hand-pick a delegate or hardcode one wrapper. The router maps the task to the right backend and keeps the parent context small (only the result summary returns):

Task	Router picks	Wrapper
Browser / UI / screenshot / sandbox	devin	`devin-delegate`
Cheap research / review / summarize / draft	kimi	`kimi-delegate`
Multi-file refactor / large codebase	grok	`grok-delegate`
Local Codex write-mode implementation	spark	`/spark`
Unknown scope	kimi to scope, then escalate	`kimi-delegate`

For owned workspace projects, default to the PR-backed delegation workflow:

Keep the parent agent as orchestrator, integrator, and final verifier.
Pick the delegate with the delegate-skill router, not by hand. See delegate-skill/SKILL.md for the full routing table and health checks.
Always call the wrapper the router names (devin-delegate, kimi-delegate, grok-delegate, /spark) — never raw devin, raw pi --provider kimi-coding, or backgrounded delegate commands. Wrappers preserve envelope checks, fallback, and telemetry.
Give every delegate the workspace path, scope, constraints, acceptance checks, and expected output. Prefer batched questions and file references over pasted code.
For non-trivial owned-repo changes, stage only relevant files, run the repo's validation, push a feature branch, and open a PR so the work is backed up off the server.

The call-reduction tips below apply to whichever wrapper the router selects (written here as <delegate>-delegate). Orchestrator-to-subagent calls have fixed overhead (envelope, fallback wiring, telemetry). Reduce by:

1. Batch — 5 questions per call, not 1

# BAD: 5 calls × overhead
<delegate>-delegate --task "Check zero-value guard in submit()"
<delegate>-delegate --task "Check oracle replay protection"
...

# GOOD: 1 call, 5 questions, ~70% token savings
<delegate>-delegate --task "Answer CLEAN or FINDING+file:line for each:
Q1. StakingRouter.submit(): zero-value ETH guard?
Q2. reportModuleBeaconBalance: replay protection?
Q3-Q5. ..."

2. Reference, don't quote

# BAD (pastes 50 lines into prompt)
<delegate>-delegate --task "Review this: [code block]"

# GOOD (the delegate reads it itself — 30-70% cheaper)
<delegate>-delegate --task "Read OracleAdapter.sol:120-135. Does _validateSlashGuard
enforce a floor? CLEAN or FINDING."

3. Constrain output format

Append to every task: "Answer CLEAN or FINDING+file:line. No preamble." — cuts response tokens 40-60%.

4. Pre-compress context before delegating

./scripts/token-reduce-paths.sh "staking contracts" > /tmp/ctx.txt
<delegate>-delegate --task "..." --context-file /tmp/ctx.txt

5. Never background with `&` — use Agent tool for parallelism

<delegate>-delegate ... 2>&1 & writes to terminal FD, not the task output file. Use Agent(description=..., prompt="Use <delegate>-delegate ...") instead.

6. Build the envelope with the wrapper to reduce in-model planning

# --print-envelope emits the structured plan; no per-skill script paths needed
<delegate>-delegate --print-envelope --task "audit X" > /tmp/envelope.txt
<delegate>-delegate --context-file /tmp/envelope.txt --task "execute the plan above"

Details: `references/meta-learnings-2026-05-31.md`

Read references/token-reduction-guide.md for benchmark notes and integration details. Read references/delegate-skill-integration.md for how token-reduce integrates the delegate-skill router. Read references/companion-tools.md for how to evaluate future companion backends. Read references/graphify-evaluation.md for the graphify companion verdict. Read references/caveman-evaluation.md for the caveman companion verdict. Read references/headroom-evaluation-2026-06-10.md for the Headroom proxy/MCP pilot verdict. Read references/axi-evaluation.md for the AXI companion verdict. Read references/prompt-stack-intake-2026-04-18.md for the 10-dependency prompt-stack intake verdict and evidence. Read references/feature-matrix.md for the complete feature/command/config/telemetry map. Read references/meta-learnings-2026-04-18.md for validated integration lessons and guardrails. Read references/meta-learnings-2026-04-19.md for QMD indexing/routing synchronization lessons and latency/adoption follow-ups. Read references/meta-learnings-2026-04-25.md for telemetry-window interpretation and diagnostics normalization lessons. Read references/meta-learnings-2026-05-06.md for telemetry-driven instrumentation and propagation workflow lessons. Read references/meta-learnings-2026-05-20.md for docs fast-path routing and weekly maintenance automation lessons. Read references/tier-value-profile.md for keep/conditional/excluded dependency-tier decisions.

token-reduce

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

token-reduce

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Token Reduction Skill

Description

Triggers

First Move

Heuristics

Process

Headroom Companion (Pilot)

Output Brevity Profile (Companion)

Success Criteria

QMD

Flow

Success

AI Delegate Call Reduction (via the delegate-skill router)

1. Batch — 5 questions per call, not 1

2. Reference, don't quote

3. Constrain output format

4. Pre-compress context before delegating

5. Never background with & — use Agent tool for parallelism

6. Build the envelope with the wrapper to reduce in-model planning

Details: references/meta-learnings-2026-05-31.md

Similar Skills

Token Reduction Skill

Description

Triggers

First Move

Heuristics

Process

Headroom Companion (Pilot)

Output Brevity Profile (Companion)

Success Criteria

QMD

Flow

Success

AI Delegate Call Reduction (via the delegate-skill router)

1. Batch — 5 questions per call, not 1

2. Reference, don't quote

3. Constrain output format

4. Pre-compress context before delegating

5. Never background with & — use Agent tool for parallelism

6. Build the envelope with the wrapper to reduce in-model planning

Details: references/meta-learnings-2026-05-31.md

Similar Skills

5. Never background with `&` — use Agent tool for parallelism

Details: `references/meta-learnings-2026-05-31.md`

5. Never background with `&` — use Agent tool for parallelism

Details: `references/meta-learnings-2026-05-31.md`