Search everything...

Stats

Actions

Available In

calibrate

Name: calibrate
Author: akshat4165

By akshat4165

Per-project model policy for Claude Code: analyzes your repo, writes a routing policy for which Claude model plans, codes, and reviews, then proves the token savings.

npx claudepluginhub akshat4165/calibrate --plugin calibrate

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents4

coder

/coder

Implement a planned change — write or edit code following an approved plan. Before invoking, read .claude/model-policy.json and pass the policy's phases.code.model as this invocation's model parameter; fall back to sonnet if no policy exists.

explorer

/explorer

Read-only codebase exploration — find files, trace call paths, summarize how a subsystem works. Before invoking, read .claude/model-policy.json and pass the policy's phases.explore.model as this invocation's model parameter; fall back to haiku if no policy exists.

planner

/planner

Architecture decisions, implementation planning, and breaking down complex features. Before invoking, read .claude/model-policy.json and pass the policy's phases.plan.model as this invocation's model parameter; fall back to opus if no policy exists.

reviewer

/reviewer

Review a diff or recent changes for bugs, security issues, and correctness. Before invoking, read .claude/model-policy.json and pass the policy's phases.review.model as this invocation's model parameter — but if the diff touches paths matched by a policy escalation rule (e.g. auth/payment/infra globs), pass the escalated model instead.

Skills2

Calibrate this project

/project

Analyze this repository and write a per-project model routing policy (.claude/model-policy.json) deciding which Claude model tier handles planning, coding, review, and exploration. Use when the user asks to calibrate the project, optimize token spend, or pick models for this repo.

Savings report

/savings

Report token spend for this project and estimate savings achieved by the model policy versus an all-top-tier baseline. Use when the user asks how many tokens the policy saved, for a spend report, or to verify calibration is working.

Hooks1

Event Hooks

1 hook across 1 event

Stats

Version0.1.3

LanguageShell

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitJun 11, 2026

AddedJun 11, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

calibrate

Safety Signals

Caution

Uses power tools

Uses Bash, Write, or Edit tools

README

calibrate

Your Claude Code learns each project's cheapest safe model policy — and proves the savings.

Heavy Claude users burn tokens guessing which model to use: Opus to plan? Sonnet to code? Haiku to review? The right answer depends on the project — its size, risk surface, and test coverage. calibrate has Claude analyze the repo once, write an auditable per-project routing policy, route phase work through it, and report measured savings versus running everything on the top tier.

It reasons per project, not from a template

Calibrate two different repos and you get two different policies — because the analysis is grounded in each repo's actual risk and safety net. Real output from two projects:

Phase	A security-sensitive Next.js app	A standard Python FastAPI + RAG backend
plan	opus — subtle partner-sync + Firestore-rules security, no tests to catch a bad design	sonnet — standard patterns, no architectural complexity worth Opus
code	sonnet — strong types + `next build` catch errors	sonnet — zero tests, so not cheaper than this
review	sonnet, escalates to opus on `auth/`, `partner/`, `firestore.rules`	sonnet, escalates to opus on `main.py`, `rag.py`, `ingest/**`, `config.py`
explore	haiku	haiku

The planning tier dropped from Opus to Sonnet on the simpler project — that's the whole idea: Claude already knows how hard a project is, so let it decide.

Measured savings

On a verified end-to-end run in the Next.js app — routing checked against the session logs, not the model's self-report — the policy cost $1.48 vs $2.78 all-Opus: 46.8% saved. That run deliberately touched security-sensitive code, so 3 of 4 agents escalated to Opus; a task that doesn't hit flagged paths keeps code+review on Sonnet and saves more. So ~47% is roughly the floor, on a worst-case task. (Dollar figures are subscription quota-value, not cash.)

Honest caveat: that's a single verified data point so far. The second project's savings are estimated (50–65%) pending its own measured run. Calibrate reports theoretical numbers as theoretical and measured numbers as measured — never the two confused.

How it works

/calibrate:project — Claude profiles the repo (scale, stack, risk paths, test/CI safety net; interviews you instead if the repo is empty) and writes .claude/model-policy.json: a model tier per phase, each with a one-line evidence-based rationale, plus escalation rules for risky paths and repeated failures. Tier aliases (haiku/sonnet/opus/fable) auto-upgrade as Anthropic ships new model versions.
Phase agents — planner, coder, reviewer, explorer subagents run on the policy's model for their phase, escalating when a diff touches a flagged path. Switching happens at subagent and session boundaries, never mid-conversation — so the prompt cache survives.
/calibrate:savings — parses your local usage logs, compares actual spend against an all-top-tier baseline, and reports the delta alongside quality incidents (escalations, retries, test failures) so savings claims stay honest.

Install

/plugin marketplace add akshat4165/calibrate
/plugin install calibrate@calibrate

For development: claude --plugin-dir ./calibrate. Then, inside any project: /calibrate:project.

What's proven vs. what's next

Honest status — this is early:

Routing works, verified against session logs across two projects (TypeScript + Python)
Per-project reasoning, demonstrated (the plan-tier contrast above)
Published & installable from this marketplace; claude plugin validate passes
One measured savings run (46.8%, verified)
Phase-0 question answered — a hook can't switch the main conversation's model mid-session, but .claude/settings.json cleanly sets what each new session starts on (see anthropics/claude-code#43326)
More measured runs across more languages before claiming a headline savings number
Phase 2 — the learning loop: auto-adjust the policy from observed escalations, retries, and test failures (the durable moat; not built yet)
Community marketplace submission

Known rough edges: /calibrate:savings needs read access to ~/.claude/projects/ (or ccusage) to compute actual dollars — a restricted session falls back to theoretical-only. Creating .claude/ in a fresh repo prompts for permission once.

Repository layout

View full README on GitHub

calibrate

Popularity

What's Inside

Confidence

README

calibrate

It reasons per project, not from a template

Measured savings

How it works

Install

What's proven vs. what's next

Repository layout

Similar Plugins

caveman

llm-council-plugin

self-improving-agent

ui-design

claude-mem

calibrate

It reasons per project, not from a template

Measured savings

How it works

Install

What's proven vs. what's next

Repository layout

Popularity

Health & Quality

Similar Plugins

caveman

llm-council-plugin

self-improving-agent

ui-design

claude-mem