InferenceIQ — Prompt Token Optimizer and Saver for Claude Code

Makes Claude prompts cheaper to send and replies cheaper to receive — without changing meaning — and shows the savings on a live dashboard (UI brand: FortiInferenceIQ).

How it works - Prompt Injection

Strips filler from your prompt while preserving meaning.
Brevity directive (CONCISE) — appends a short "be brief, lead with the answer" instruction to the last turn, so replies come back shorter. The big lever: output tokens cost ~5× input.
Model routing — sends simple requests to cheaper models when possible (Haiku/Sonnet/Opus).
Semantic cache — serves repeated, non-agentic queries with no upstream call.
Unified dashboard — tracks token + $ savings across every surface, tagged per machine.

You apply it two ways: a Claude Code hook/plugin (works on any login, incl. Pro/Max OAuth) and an in-path proxy (API-key only; measures real output savings). It never breaks Claude Code's tool loop.

Archicture - Mode

Components

Path	What it is
`core-engine/optimize.py`	mechanical filler-strip + token counting + reporting; also a CLI
`core-engine/router.py`	deterministic intent → model routing (Haiku/Sonnet/Opus)
`core-engine/semcache.py`	3-layer semantic cache (non-agentic, text-only)
`core-engine/calibrate.py`	same-prompt brevity gauge (reply-trimming on vs off)
`proxy/intercept.py`	optimizing reverse proxy, port :8082
`dashboard/collector.py`	standalone metrics dashboard, port :8088
`.claude/hooks/optimize_prompt.py`	`UserPromptSubmit` hook (shipped as a plugin)
`iq`	launcher: starts the proxy and runs Claude Code through it

Install Mode Plugin — Claude Code plugin (the hook, no API key)

In Claude Code:

/plugin marketplace add svuillaume/InferenceIQ
/plugin install inferenceiq@inferenceiq

Auto-shortens your prompts and nudges shorter replies on every prompt; never blocks; works on any login. (Takes effect next session.)

Install Mode Proxy — Proxy + Claude Code

docker compose up -d --build     # start the proxy on :8082
./iq                             # run Claude Code through the proxy
docker compose down              # stop

Turn reply-trimming off: CONCISE=0 docker compose up -d intercept (on by default).
Report to a dashboard: INFERENCEIQ_DASHBOARD=http://<host>:8088 docker compose up -d intercept.

Install — dashboard only (standalone)

Depends on nothing else; one instance collects from many machines and persists totals across restarts via a volume.

cd dashboard
docker build -t iq-dashboard .
docker run -d --name iq-dashboard -p 8088:8088 -v iq_data:/data --restart unless-stopped iq-dashboard

Open: http://<host>:8088 (simple before/after view) · http://<host>:8088/full (detailed).
Reset counters: curl -XPOST http://<host>:8088/api/reset
Redeploy without losing totals: rebuild, docker stop iq-dashboard && docker rm iq-dashboard, then re-run with the same -v iq_data:/data.

How the before/after numbers are measured

Input — filler removed by the optimizer (after = billed input).
Output — median reply length with trimming on (CONCISE=1) vs off (CONCISE=0), over the concise replies served. For a true same-prompt baseline: ANTHROPIC_API_KEY=… ./core-engine/calibrate.py.
All from Anthropic's real usage object — not estimates.

🧠 Key clarification (important)

input_tokens = tokens actually sent in the request (after any preprocessing like optimization or caching layers)
output_tokens = tokens generated by the model in the response
These are the official billing + telemetry fields returned by Anthropic’s API

Configuration (proxy env)

CONCISE (1/0 reply-trimming) · ROUTE_MODELS (on/advise/off) · CACHE_ENABLED (1/0) · INFERENCEIQ_DASHBOARD (where to report; off disables) · IQ_TOKEN (shared secret for a token-protected collector) · IQ_PERSIST_PATH (dashboard persistence file, default /data/tally.json in the image).

See CLAUDE.md for architecture/safety invariants and roadmap.md for what's applied vs planned.

inferenceiq

Popularity

What's Inside

README

InferenceIQ — Prompt Token Optimizer and Saver for Claude Code

How it works - Prompt Injection

Archicture - Mode

Components

Install Mode Plugin — Claude Code plugin (the hook, no API key)

Install Mode Proxy — Proxy + Claude Code

Install — dashboard only (standalone)

How the before/after numbers are measured

Configuration (proxy env)

Confidence

Similar Plugins

llm-council-plugin

claude-mem

antigravity-bundle-web-designer

everything-claude-code