A Claude Code plugin that shows exactly where your AI coding session wasted tokens — and how to fix it.
It flags waste, never quality. It runs on your machine. It speaks tokens, not dollars.
−45% tokens · −44% turns · 4/6 → 6/6 correct
measured in a live A/B — 6 sessions per arm, 12 total · see the method ↓
Runs as an external local engine — zero added context overhead in your own session.
Install ·
What it catches ·
Numbers ·
The story ·
How it works ·
FAQ
Install
About a minute. Two commands. The first adds the jimmy-tools marketplace from this repo (Jimmynycu/token-efficiency); the second installs the plugin from it. The @jimmy-tools handle is the marketplace name declared in .claude-plugin/marketplace.json — copy both lines exactly.
# 1) add the marketplace (registers as: jimmy-tools)
claude plugin marketplace add Jimmynycu/token-efficiency
# 2) install the plugin from that marketplace
claude plugin install token-efficiency-coach@jimmy-tools
Prefer the interactive UI? Run /plugin marketplace add Jimmynycu/token-efficiency, then /plugin install token-efficiency-coach@jimmy-tools — or just /plugin and pick it from Discover.
[!NOTE]
What "runs locally" means. The coach does all its analysis on your machine — no session content ever leaves. The install step above is the one exception: marketplace add fetches this repo over the network and may ask you to confirm trusting the marketplace before it registers. Once installed, nothing else phones home.
[!TIP]
What you get, immediately: a live token-first statusline (ctx tokens · cache% · ⚠), an automatic non-blocking coach that summarizes waste when a session ends, and an on-demand /token-efficiency-coach:coach you can run any time.
That's it. No API keys, no account. The coach reads your local session and talks back.
What it catches
The coach scans a session for waste — the tokens you paid for and didn't need. It does not grade whether your code or answers were "good." Eight things it flags:
- Context bloat — the window quietly ballooning turn over turn
- Uncached context — paying full freight for context that could have been cached
- Oversized tool output — a single tool dumping a wall of tokens into the window
- Output-heavy turns — generation spend that ran hotter than the task needed
- Failed tool calls — errors you paid to send and paid to read
- Wrong-tier routing — a heavyweight model doing featherweight work
- Redundant reads — re-reading the same file the model already has
- Very long single thread — one sprawling session that never compacted or split
[!NOTE]
Every flag is a token flag. The coach never says "your code is bad." It says "this part cost more than it had to, here's the cheaper path."
Numbers
We ran a live A/B: 6 sessions per arm, 12 real headless claude sessions total, one fixed task, the actual work held constant. Coached vs. baseline:
| Metric | Baseline | Coached | Change |
|---|
| Mean tokens / session | 133,146 | 72,834 | −45% |
| Mean turns / session | 4.5 | 2.5 | −44% |
| Task correctness | 4 / 6 | 6 / 6 | +2 solved (4/6 → 6/6) |