Token-Saver

Cut your AI coding costs by 60-99% on CLI output — without losing a single error message.
Token-Saver is a drop-in context-window optimizer for AI coding assistants. It compresses the verbose terminal output your agent reads — git diff, pytest, npm install, terraform plan, kubectl, docker — so you spend fewer tokens, stay under your LLM context limit, and get faster, cheaper, more focused responses.
36 specialized processors understand the tools you already use — git, pytest, jest, cargo, go, docker, kubernetes, terraform, pulumi, helm, ansible, aws, gcloud, and more. Each one knows exactly what to keep and what to discard: errors, diffs, stack traces, and actionable data stay; progress bars, passing tests, download spinners, and boilerplate go.
Compatible with Claude Code and Antigravity CLI. Zero added latency. No extra LLM calls. Fully deterministic. One install, instant savings.
Why developers use Token-Saver:
- 💸 Lower API bills — pay for signal, not noise. Typical savings of 60-99% per command.
- 🪟 Bigger effective context — fit more real work into the same context window.
- ⚡ Faster responses — less text for the model to read means quicker turnarounds.
- 🎯 Zero information loss — precision-tested so every error, diff, and warning survives.
- 🔌 Install once, forget it — works automatically in the background, no prompts to change.
- 🛡️ Private & offline — pure regex/parsing, no data ever leaves your machine.
Before & After
| Command | Raw Output | Compressed | Savings |
|---|
git diff (large refactor) | 2,270 tokens | 909 tokens | 60% |
pytest (500 tests, 2 failures) | 6,744 tokens | 308 tokens | 95% |
npm install (220 packages) | 3,844 tokens | 4 tokens | 99% |
terraform plan (15 resources) | 1,840 tokens | 137 tokens | 93% |
kubectl get pods (40 pods) | 1,393 tokens | 79 tokens | 94% |
docker compose logs (4 services) | 3,200 tokens | 480 tokens | 85% |
helm template (12 manifests) | 2,100 tokens | 210 tokens | 90% |
Run token-saver benchmark <command> to measure savings on your own workloads.
Why
Every CLI command your AI assistant runs burns tokens — and most of that output is noise. A 500-line git diff, a pytest run with 200 passing tests, an npm install with 80 packages: the model only needs errors, modified files, and results. Everything else is wasted context and wasted money.
Token-Saver sits between the CLI and your AI assistant, compressing output with content-aware strategies. The model sees exactly what it needs — nothing more, nothing less. Your context window stays clean, your costs drop, and your assistant responds faster with less noise to process.
How It Compares
Token-Saver takes a different approach from LLM-based or caching solutions — see the full comparison.
How It Works
Architecture
CLI command --> Specialized processor --> Compressed output
|
36 processors
(git, test, cargo, go, build,
lint, package_list, python_install,
maven_gradle, bun, network, docker,
kubectl, terraform, pulumi, cdktf,
nix, mise, env, search, system_info,
gh, db_query, cloud_cli, ansible,
helm, syslog, ssh, jq_yq, just, act,
structured_log, file_listing,
file_content, generic)
The engine (CompressionEngine) maintains a priority-ordered chain of processors.
The first processor that can handle the command (can_handle()) produces the
compressed output. GenericProcessor serves as a fallback and always matches last.
When a specialized processor doesn't achieve the minimum compression ratio (10%),
the engine tries the generic processor as a fallback before returning uncompressed output.
After the specialized processor runs, a lightweight cleanup pass (clean())
strips residual ANSI codes and collapses consecutive blank lines.
Platform Integration
The two platforms use different mechanisms:
Claude Code (PreToolUse hook):
1. Claude wants to run `git status`
2. PreToolUse hook intercepts the command
3. Rewrites to: python3 wrap.py 'git status'
4. wrap.py executes the original command
5. Compresses the output
6. Claude receives the compressed version