___ ___ ___ __ __ _ ___ ___ ___ ___ _____
| _ \| __| _ \ \/ | /_\ | __| _ \/ _ \/ __|_ _|
| _/| _|| / |\/| |/ _ \| _|| / (_) \__ \ | |
|_| |___|_|_\_| |_/_/ \_\_| |_|_\\___/|___/ |_|
freeze the prefix · melt the bill
A Claude Code plugin that keeps DeepSeek's prompt cache hitting when Claude Code would otherwise bust it.
cache-stable passthrough proxy · deterministic tool order · env freeze+delta · cold-anchor coalescing · live hit-rate statusline · zero deps
Pointing Claude Code at DeepSeek's Anthropic-compatible endpoint is cheap, and a
vanilla session already hits DeepSeek's cache ~90% on its own — CC's system
prompt is shared across users, so it stays warm. The cache breaks when your
tool list isn't stable: MCP servers connect and reshuffle it, and because
tools render first in the cached prefix, any reorder busts everything from
byte 0. Permafrost sits between CC and DeepSeek and rewrites the cache-relevant
bytes so the tools + system anchor stays byte-identical turn after turn — then
streams the reply straight back.
Claude Code ──Anthropic /v1/messages──▶ Permafrost ──▶ DeepSeek /anthropic
(unchanged) 127.0.0.1:8787 freeze prefix (both speak Anthropic,
+ record hits no translation)
Do you actually need it?
We ran bare DeepSeek vs Permafrost head-to-head on the live API and read
DeepSeek's real cache tokens. The honest result is two-sided:
| Your Claude Code session | bare DeepSeek | + Permafrost |
|---|
| Vanilla — stable tools, no MCP (10-turn task) | 89.6% hit | 89.6% — no difference |
| Tool list churns — MCP servers reshuffle it | 33% hit | 71% hit |
If you run plain single-agent CC, you don't need this — DeepSeek already does
the job. Permafrost earns its keep on the second row: tool churn busts the prefix
at byte 0 and bare collapses to 33%; the deterministic sort holds it at 71%
(reproduce: e2e/tool_order_ab.py). So it's for users with
MCP servers, heavily customized setups whose anchor isn't the shared-warm
one, or parallel subagent fan-outs on cold anchors.
On cost: DeepSeek is ~11× cheaper than Claude on pricing alone (cache-hit
input $0.0028 vs $0.30 /1M) — that's DeepSeek, not us, the moment you switch
endpoints. Permafrost's additional contribution is keeping the cache hitting
when it would otherwise bust (the 33% → 71% roughly halves the bill on a
tool-churning workload; on a vanilla session it adds ~0%). Price-for-identical-
traffic, not a model-quality claim — deepseek-v4-flash is not Sonnet 4.6.
Quick start
git clone https://github.com/jianzhichun/permafrost && cd permafrost
export ANTHROPIC_API_KEY=sk-your-deepseek-key
./cli/permafrost wrap # starts the proxy, sets env for the child only, execs claude
wrap sets ANTHROPIC_BASE_URL + ENABLE_TOOL_SEARCH=true for the child claude
process only — it never touches your shell or ~/.claude/settings.json.
As a plugin (gives you /permafrost:* commands, statusline, auto-start hook):
/plugin marketplace add jianzhichun/permafrost
/plugin install permafrost@permafrost
Persistent: copy the env block from settings.example.json
into ~/.claude/settings.json, run permafrost up, start claude normally. (CC
reads its env once at launch — set these before it starts.)
What it does
On every /v1/messages request, the pipeline
(proxy/permafrost_align.py) keeps the tools+system
anchor byte-stable, then reads DeepSeek's real prompt_cache_hit_tokens:
- sorts tools deterministically — late-binding MCP servers can't reshuffle
position 0 of the prefix. (This is the one that matters most — see above.)
- freezes the env block + emits deltas — pins the first-seen env/context
block (cwd, date,
git status) into the anchor; later turns send only changed
lines on the tail. PERMAFROST_FREEZE_ENV=0 relocates the whole block instead.
- strips
cache_control and serializes canonically — DeepSeek ignores
the markers; canonical bytes remove serialization drift.
Plus three runtime features (depth in docs/HOW-IT-WORKS.md):