Tokenmin Scanner
The public, Apache-2.0 audit copy of Tokenmin. This is the code that decides
what (if anything) leaves your machine when you run tokenmin. About 5 minutes
of reading, end to end.
The deal: Tokenmin is free during the friends-and-family preview in exchange for
your anonymized usage data. The scanner that does the anonymization is open
precisely so the trade is verifiable, not just promised. Read it. Diff it
against your install. Then decide if you trust the bargain.
curl --proto '=https' --tlsv1.2 -fsSL https://tokenmin.ai/install.sh | bash
What you get in the first 60 seconds
~ tokenmin
▶ scanning ~/.claude
✓ found 57 sessions in last 14 days
✓ anonymized
✓ analyzed
Tokenmin Claude usage audit
────────────────────────────────────────────────────────────────────────
scanned 57 sessions over 14 days
est. spend (window): $6,860
model mix: Opus 99% · Sonnet 1%
────────────────────────────────────────────────────────────────────────
Headline ~$7,151/mo recoverable across 7 fix(es), ~4.8 hrs total
1. A lot of your spend is on Opus — route by tier
$$$$ ▮▮▮▮▮▮▮▮▮▮ $7,055/mo 0.1 hrs · conf 55% · model routing
evidence: 100% of $6,860 weekly spend on Opus across 52 sessions.
→ tokenmin show model_overspend
...
A rich terminal card with the headline dollar figure, ranked findings,
severity pills, per-finding next-action. Then tokenmin show <id> drills
into one finding's evidence + fix. Then tokenmin watch runs a live
dashboard while you work.
What's in here, what isn't
| Concern | Where |
|---|
Walking ~/.claude sessions, settings, agents, skills, MCP config | This repo — skills/tokenmin/analyzer.py |
| Parsing claude.ai / Claude Desktop chat exports | This repo — skills/tokenmin/analyzer_chat_export.py |
| Anonymization (paths, secrets, labels, identifiers) | This repo — skills/tokenmin/anonymize.py |
| Orchestrator CLI: collect → anonymize → submit → render | This repo — skills/tokenmin/tokenmin.py |
| Wrapper script + auto-update + version + doctor | This repo — tokenmin |
| Tests + CI | This repo — tests/, .github/workflows/ci.yml |
| Detection rules, scoring, report rendering | Not here. Lives in proprietary watsonrm/tokenmin-core. |
| Hosted submission server | Not here. Bundled in the F&F preview. |
This repo is scanner-only. Running it produces an anonymized snapshot. It
does not produce a report (that's the engine's job). The scanner is fully
functional without the engine — you can write the snapshot to disk with
--snapshot snap.json and audit what would be sent before deciding to submit
anywhere.
Install
Quick (trusts the network all the way to GitHub):
curl --proto '=https' --tlsv1.2 -fsSL https://tokenmin.ai/install.sh | bash
Verify-then-run (recommended if you don't trust the network all the way to GitHub):
curl --proto '=https' --tlsv1.2 -fsSL -o install.sh https://tokenmin.ai/install.sh
curl --proto '=https' --tlsv1.2 -fsSL -o install.sh.sha256 https://tokenmin.ai/install.sh.sha256
shasum -a 256 -c install.sh.sha256
less install.sh
bash install.sh
The installer detects every Claude variant on your machine (Code / Desktop on
macOS / Linux / Windows), drops a single tokenmin command on PATH, and offers
to add it to your shell rc with consent. No gh, no brew, no auth setup.
After install:
tokenmin --selfcheck # see the anonymizer rules without reading Python
tokenmin # scan + render inline (the magic moment)
tokenmin watch # live dashboard
tokenmin show <id> # drill into one finding
tokenmin help # 30-second walkthrough
Trust posture
Hashes are HMAC-keyed, not raw SHA-256
Identifiers (file paths, project names, MCP server names, custom agent /
skill / command names) hash with HMAC-SHA256 keyed by a 32-byte salt
generated on first run at ~/.tokenmin/.salt (chmod 0600, refuses to
overwrite via O_EXCL). Output is 16 hex chars (64 bits) — collision-resistant
for any realistic corpus.
An adversary who guesses common path names like ~/.ssh/known_hosts cannot
precompute its hash without your salt. Cross-snapshot correlation works within
your install (so the engine can flag "same file re-read 12×"); cross-user
correlation is broken.
Stricter mode: TOKENMIN_STRICT_ANONYMIZE=1 adds an additional per-run salt.
Breaks within-user cross-run correlation too at the cost of losing
across-days findings.
Defense-in-depth on inputs
Pathological JSONL inputs (oversized lines, regex-bomb strings, malformed
JSON) can't hang the scrubber: every regex sees inputs truncated to 64 KiB
max; bad lines are dropped, not raised on; recursion depth is capped.