Glyphdown — token-cost reduction for Claude Code

Glyphdown is a Claude Code plugin that lowers token cost across a session without changing what the agent can see. It compacts tool-result output, removes repeated content, and steers compaction toward a dense form — all lossless by meaning, fail-open (any error passes the original through untouched), and fast (a prebuilt native binary on the hot path, Python as the portable fallback).

It is dogfooded in production daily — the author runs it on their own Claude Code traffic — and only recently open-sourced. Lossless-by-meaning and fail-open are not aspirations; they are how it has had to behave to stay on every session.

It is free for noncommercial use (PolyForm Noncommercial 1.0.0); commercial use needs a paid license — see COMMERCIAL.md.

Your agent's tool output is mostly noise — ANSI codes, repeated reads, machine chatter, verbose compaction summaries. Glyphdown strips it losslessly, on-device, before it bills — and stacks on top of Anthropic's prompt cache.

Measured	Reduction	On
Tool-heavy session corpus (52 real fixtures)	−31.7%	total tokens 85,405 → 58,347 (`chars/4` proxy)
Large `Bash` dumps	−71.1%	the noisiest payloads
Instruction prose in the GLYPHDOWN-L1 dialect	−44.6%	every cached system-prompt call (Claude Opus tokens)
Network calls · API keys · data leaving your machine	0	100% local, fail-open

_{Char reduction is general; exact token % is model-specific — see General vs model-specific savings. Figures are measured, not asserted; the codec is fully open (glyphdown-core/) so the lossless behavior is verifiable, and the project does not publish numbers it has not measured.}

How it works

Glyphdown hooks the request lifecycle at six points; every one fails open ({"continue": true} on any error — it can never block your input):

flowchart LR
  U([Your prompt]) --> UPS[UserPromptSubmit<br/>mode detector]
  UPS --> PRE[PreToolUse<br/>history dedup]
  PRE --> TOOL[(Tool runs)]
  TOOL --> POST[PostToolUse<br/>codec + session dedup]
  POST --> Q{compaction?}
  Q -- yes --> PC[PreCompact<br/>dense-form mandate]
  Q -- no --> R([Reply — never compressed])
  PC --> R

It does not fight Anthropic's cache — it works on a different token bucket. Native caching discounts what is already cached; Glyphdown shrinks the turn-to-turn traffic that changes every call and therefore never caches, plus the dense form of what does get cached. The two stack:

flowchart TB
  subgraph B[What you are billed per call]
    P[Stable prefix<br/>system prompt + tools]
    T[Turn-to-turn traffic<br/>tool results, history, compaction]
  end
  P -->|Anthropic cache: up to −90%| C[cheap]
  P -->|Glyphdown dialect: −44.6% of what caches| C
  T -->|Glyphdown codec: −31.7% corpus| S[shrunk every call]
  C --> L([Lower total])
  S --> L

The hot path is a prebuilt native binary; Python is the portable fallback:

flowchart LR
  H[PostToolUse hook] --> Qb{native binary<br/>available?}
  Qb -- yes --> Rb[Rust codec ~5 ms]
  Qb -- "no / GLYPHDOWN_RUST=0" --> Py[Python codec ~170 ms]
  Rb --> O([compacted output<br/>identical, fail-open])
  Py --> O

GLYPHDOWN-L1 — the symbolic dialect at the core

Glyphdown is named for its CoS — the symbolic notation it started as. GLYPHDOWN-L1 is a lossless prose↔dense transcoder: it rewrites verbose, repetitive instruction-style prose (system prompts, CLAUDE.md, skill and agent files) into a compact symbolic dialect the same model decodes natively, then expands it back byte-for-byte.

expand(compress(x)) == x      # byte-identical for dialect content;
                              # unrecognized text passes through untouched

Why it matters: the system prompt ships on every request, and dense instructions cost far fewer tokens while the model reads them just as well — so this is the only always-on, every-call saving. Measured −44.6% token reduction on dialect content (opus-dialect-validate-2026-05-31). It is the language behind the PreCompact dense-form mandate and the compress-config command — and it is where the whole project began. Every other mechanism (tool-result codec, dedup, the state-aware gate) stacks on top of it.

Model-specific dialects. Tokenizers differ per model, so the dialect is a data file the binary loads at runtime (GLYPHDOWN_DIALECT) — tune or ship a dialect for your model with no rebuild (lossless self-check on load). glyphdown-core dialect-export dumps the default to edit; glyphdown-core compress / expand run it directly; compress-config applies it to your config files (dry-run + backup + lossless gate).

Quickstart — first 5 minutes

Glyphdown — token-cost reduction for Claude Code

It is free for noncommercial use (PolyForm Noncommercial 1.0.0); commercial use needs a paid license — see COMMERCIAL.md.

Your agent's tool output is mostly noise — ANSI codes, repeated reads, machine chatter, verbose compaction summaries. Glyphdown strips it losslessly, on-device, before it bills — and stacks on top of Anthropic's prompt cache.

Measured	Reduction	On
Tool-heavy session corpus (52 real fixtures)	−31.7%	total tokens 85,405 → 58,347 (`chars/4` proxy)
Large `Bash` dumps	−71.1%	the noisiest payloads
Instruction prose in the GLYPHDOWN-L1 dialect	−44.6%	every cached system-prompt call (Claude Opus tokens)
Network calls · API keys · data leaving your machine	0	100% local, fail-open

How it works

Glyphdown hooks the request lifecycle at six points; every one fails open ({"continue": true} on any error — it can never block your input):

flowchart LR
  U([Your prompt]) --> UPS[UserPromptSubmit<br/>mode detector]
  UPS --> PRE[PreToolUse<br/>history dedup]
  PRE --> TOOL[(Tool runs)]
  TOOL --> POST[PostToolUse<br/>codec + session dedup]
  POST --> Q{compaction?}
  Q -- yes --> PC[PreCompact<br/>dense-form mandate]
  Q -- no --> R([Reply — never compressed])
  PC --> R

flowchart TB
  subgraph B[What you are billed per call]
    P[Stable prefix<br/>system prompt + tools]
    T[Turn-to-turn traffic<br/>tool results, history, compaction]
  end
  P -->|Anthropic cache: up to −90%| C[cheap]
  P -->|Glyphdown dialect: −44.6% of what caches| C
  T -->|Glyphdown codec: −31.7% corpus| S[shrunk every call]
  C --> L([Lower total])
  S --> L

The hot path is a prebuilt native binary; Python is the portable fallback:

flowchart LR
  H[PostToolUse hook] --> Qb{native binary<br/>available?}
  Qb -- yes --> Rb[Rust codec ~5 ms]
  Qb -- "no / GLYPHDOWN_RUST=0" --> Py[Python codec ~170 ms]
  Rb --> O([compacted output<br/>identical, fail-open])
  Py --> O

GLYPHDOWN-L1 — the symbolic dialect at the core

expand(compress(x)) == x      # byte-identical for dialect content;
                              # unrecognized text passes through untouched

glyphdown

Popularity

Confidence

What's Inside

README

Glyphdown — token-cost reduction for Claude Code

How it works

GLYPHDOWN-L1 — the symbolic dialect at the core

Quickstart — first 5 minutes

Similar Plugins

fullstack-dev-skills

claude-buddy

creative-writing

context7-plugin

nature-skills

drawio-diagramming

More by MikkoParkkola

trvl

Glyphdown — token-cost reduction for Claude Code

How it works

GLYPHDOWN-L1 — the symbolic dialect at the core

Quickstart — first 5 minutes

Popularity

Health & Quality

More by MikkoParkkola

trvl

Similar Plugins

fullstack-dev-skills

claude-buddy

creative-writing

context7-plugin

nature-skills

drawio-diagramming