claude-session-budget

Track Claude Code's 5-hour session usage locally — and automatically pause task queues before hitting the limit.

Discovered by reverse-engineering ~/.claude/projects/**/*.jsonl
No API calls. No web scraping. Pure local file parsing.

The Problem

Claude Code enforces a rolling 5-hour session limit. When running automated task queues or background agents, the session can hit its limit mid-task with no warning.

How It Works

%%{init: {'themeVariables': {'fontSize': '13px'}}}%%
flowchart TD
    A([You run a task in Claude Code]) --> B[Claude Code logs API response<br/>to local JSONL]
    B --> C[budget_check.py hook<br/>fires before next tool call]
    C --> D[scan_window:<br/>cutoff = session start]
    D --> F[Sum TTL-aware weighted tokens<br/>over all in-window jsonl entries]
    F --> G{Usage % vs<br/>calibrated limit}
    G -->|&lt; 80%| H([✓ Proceed<br/>logs % to stderr])
    G -->|80–93%| I([⟳ Proceed + log sync notice])
    G -->|≥ 93%| J[⏸ Block dispatch until<br/>5-hour session resets]
    J -.->|wait for reset| A

    A -.->|async side effect:<br/>first time pct crosses 90%<br/>per 5h window| K[fork auto_calibrate.py<br/>in background]
    K --> L[Spawn `claude` under pty,<br/>send /usage, capture panel]
    L --> M[Parse + EWMA-merge<br/>observed % into limit]
    M -.->|refines limit<br/>for next hook firing| G

Claude Code writes every API response to local JSONL files: ~/.claude/projects//.jsonl

Each assistant message contains token counts in a usage field. The hook sums these with cost-equivalent weights (TTL-aware for cache writes) and divides by a calibrated limit to estimate session usage in real time. Two self-correction paths keep that limit honest: a structural detector for real 429 / rate_limit API errors in the JSONL, and a background worker that drives claude /usage itself when the estimate first crosses each of BUDGET_AUTO_CAL_MILESTONES (default 90%) — see Background auto-calibration below. The user never has to copy-paste anything for either path.

Why no `bridge_status` anchor?

Earlier versions used type=system, subtype=bridge_status lines as a "5h session start" anchor. That turned out to be wrong: Claude Code emits a bridge_status whenever /remote-control attaches to a new CLI session, not when the underlying 5h Max window begins. Real users open claude many times within one 5h window, so anchoring on the most recent bridge_status silently leapt the cutoff forward and reset the budget count to ~0% mid-window. bridge_status is ignored entirely.

The real session boundary instead comes from /usage: auto_calibrate.py parses its Resets HH:MM clue into a session-window anchor, and scan_window counts from the anchored session start (a plain rolling-5h cutoff is the fallback when no anchor is on file yet). See docs/internals.md → Session-window anchoring.

Token Weighting (cost-equivalent, input = 1.0)

Weights mirror Anthropic's published list-price ratios so the weighted total approximates dollar cost — the dimension the 5h Max cap actually tracks.

Token Type	Weight	Notes
input_tokens	1.00×	base
output_tokens	5.00×
cache_read_input_tokens	0.10×
`cache_creation.ephemeral_5m_input_tokens`	1.25×	default cache TTL
`cache_creation.ephemeral_1h_input_tokens`	2.00×	extended cache TTL
`cache_creation_input_tokens` (legacy)	1.25×	fallback when no TTL breakdown

The per-TTL breakdown landed in newer Claude Code builds; older jsonl entries that only carry the flat cache_creation_input_tokens field are weighted at 1.25× (the historical default and a safe lower bound). When both fields are present in the same entry, the breakdown supersedes the legacy field to avoid double-counting.

Calibration

The calibrated limit is auto-learned from real Anthropic API errors:

Every time budget_check.py runs, it inspects each in-window jsonl entry for the structural API-error signature: type=system, subtype=api_error with HTTP status=429, or any nested error.type containing rate_limit / usage_limit.
When it finds a new event, it takes the weighted token total at that moment as a real-world 100% reading.
The stored limit is EWMA-merged with the observation (default α=0.35) and written to ~/.claude/.budget_calibration.json.

Why structural matching, not text? An earlier version regex-matched "rate limit" / "limit reached" in the raw jsonl line. That picked up any user/assistant message body that mentioned the topic — including conversations debugging this very tool — and produced a self-poisoning EWMA loop that drove the calibrated limit from 63M down to 16M, causing false 100% BLOCKING. Structural signature matching eliminates that class of false positive.

session-budget

Popularity

What's Inside

README

claude-session-budget

The Problem

How It Works

Why no `bridge_status` anchor?

Token Weighting (cost-equivalent, input = 1.0)

Calibration

Confidence

Similar Plugins

caveman

claude-mem

llm-council-plugin

self-improving-agent

Popularity

Health & Quality

Similar Plugins

caveman

claude-mem

llm-council-plugin

self-improving-agent

session-budget

Popularity

What's Inside

README

claude-session-budget

The Problem

How It Works

Why no bridge_status anchor?

Token Weighting (cost-equivalent, input = 1.0)

Calibration

Confidence

Similar Plugins

caveman

claude-mem

llm-council-plugin

self-improving-agent

Popularity

Health & Quality

Similar Plugins

caveman

claude-mem

llm-council-plugin

self-improving-agent

Why no `bridge_status` anchor?