Skill

model-routing

Route each coding task to the cheapest Claude model that can do it well, and escalate only on failure. This is the cost lever ponytail-style rulesets ignore.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/occam:model-routing

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Right-sizing code cuts output tokens. Routing the task to the cheapest *capable*

SKILL.md

76 lines · ~809 tokens

Stats

LanguageJavaScript

Stars0

MaintenanceGood

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Model routing, the bigger cost lever

Right-sizing code cuts output tokens. Routing the task to the cheapest capable model cuts the price-per-token underneath them, and that's the larger win. A ruleset that saves 50% of output tokens on Opus is still paying 5× the price of Haiku for the same tokens.

The price reality (per 1M tokens, as of 2026-06)

Model	ID	Input	Output	Use for
Haiku 4.5	`claude-haiku-4-5`	$1	$5	trivial / mechanical
Sonnet 4.6	`claude-sonnet-4-6`	$3	$15	standard single-file work
Opus 4.8	`claude-opus-4-8`	$5	$25	hard / ambiguous / multi-file

Output is 5× input everywhere, and Opus output is 5× Haiku output. So the two biggest levers, in order, are: (1) pick the cheapest capable model, (2) emit fewer output tokens. Occam's core ruleset does (2); this skill does (1).

Classify the task, then route

Route on the hardest sub-step the task contains, not its surface size.

Haiku 4.5, trivial / mechanical. No design decisions; the answer is mechanical once you read the code.

rename / reformat / mechanical refactor across known sites
stdlib glue, one-liners, regex, simple data transforms
writing an obvious test for existing, readable logic

Sonnet 4.6, standard. One file, clear requirement, a real but bounded decision.

a single-file feature with a stated spec
a bug fix with a clear repro
straightforward API/CLI wiring

Opus 4.8, hard. Spans files, the requirement is ambiguous, or a wrong answer is expensive or hard to detect.

multi-file features / refactors that cross module boundaries
concurrency, security boundaries, data-loss-adjacent code
"figure out why X" debugging with no clean repro
anything where under-engineering would be silently wrong

Escalate, don't gamble

Start one tier below your first instinct and escalate on a concrete signal, it's cheaper to retry on a bigger model than to default everything to Opus.

Escalate when any of these fire:

the runnable check (see core ruleset) fails twice
the model self-reports low confidence or says "this is ambiguous"
the change turned out to touch more files than the tier's profile allows
a right-sizing check (inputs / scale / failure / concurrency) is genuinely in play

De-escalate the next task if the current one finished comfortably below tier.

Don't break the cache when you route

Switching models mid-conversation invalidates the prompt cache (caches are model-scoped, see caching/NOTES.md). So:

Route per task / per session, not per turn inside one session.
For a cheap sub-step inside an Opus session, spawn a subagent on Haiku rather than switching the main loop's model. The main loop keeps its cache; the subagent pays only its own (small) prefill.

One-line heuristic

Default to Sonnet. Drop to Haiku when the task is mechanical. Reach for Opus only when a wrong answer is expensive or the problem spans files. Escalate on a failed check, never on a hunch.

model-routing

Invocation

Context Preview

SKILL.md

model-routing

Invocation

Context Preview

SKILL.md

Model routing, the bigger cost lever

The price reality (per 1M tokens, as of 2026-06)

Classify the task, then route

Escalate, don't gamble

Don't break the cache when you route

One-line heuristic

Similar Skills

Model routing, the bigger cost lever

The price reality (per 1M tokens, as of 2026-06)

Classify the task, then route

Escalate, don't gamble

Don't break the cache when you route

One-line heuristic

Similar Skills