claude-session-budget
Track Claude Code's 5-hour session usage locally — and automatically pause task queues before hitting the limit.
Discovered by reverse-engineering ~/.claude/projects/**/*.jsonl
No API calls. No web scraping. Pure local file parsing.
The Problem
Claude Code enforces a rolling 5-hour session limit. When running automated task queues or background agents, the session can hit its limit mid-task with no warning.
How It Works
%%{init: {'themeVariables': {'fontSize': '13px'}}}%%
flowchart TD
A([You run a task in Claude Code]) --> B[Claude Code logs API response<br/>to local JSONL]
B --> C[budget_check.py hook<br/>fires before next tool call]
C --> D[scan_window:<br/>cutoff = session start]
D --> F[Sum TTL-aware weighted tokens<br/>over all in-window jsonl entries]
F --> G{Usage % vs<br/>calibrated limit}
G -->|< 80%| H([✓ Proceed<br/>logs % to stderr])
G -->|80–93%| I([⟳ Proceed + log sync notice])
G -->|≥ 93%| J[⏸ Block dispatch until<br/>5-hour session resets]
J -.->|wait for reset| A
A -.->|async side effect:<br/>first time pct crosses 90%<br/>per 5h window| K[fork auto_calibrate.py<br/>in background]
K --> L[Spawn `claude` under pty,<br/>send /usage, capture panel]
L --> M[Parse + EWMA-merge<br/>observed % into limit]
M -.->|refines limit<br/>for next hook firing| G
Claude Code writes every API response to local JSONL files:
~/.claude/projects//.jsonl
Each assistant message contains token counts in a usage field. The hook
sums these with cost-equivalent weights (TTL-aware for cache writes) and
divides by a calibrated limit to estimate session usage in real time. Two
self-correction paths keep that limit honest: a structural detector for
real 429 / rate_limit API errors in the JSONL, and a background worker
that drives claude /usage itself when the estimate first crosses each
of BUDGET_AUTO_CAL_MILESTONES (default 90%) — see
Background auto-calibration
below. The user never has to copy-paste anything for either path.
Why no bridge_status anchor?
Earlier versions used type=system, subtype=bridge_status lines as a "5h
session start" anchor. That turned out to be wrong: Claude Code emits a
bridge_status whenever /remote-control attaches to a new CLI session,
not when the underlying 5h Max window begins. Real users open claude many
times within one 5h window, so anchoring on the most recent bridge_status
silently leapt the cutoff forward and reset the budget count to ~0%
mid-window. bridge_status is ignored entirely.
The real session boundary instead comes from /usage: auto_calibrate.py
parses its Resets HH:MM clue into a session-window anchor, and
scan_window counts from the anchored session start (a plain rolling-5h
cutoff is the fallback when no anchor is on file yet). See
docs/internals.md → Session-window anchoring.
Token Weighting (cost-equivalent, input = 1.0)
Weights mirror Anthropic's published list-price ratios so the weighted
total approximates dollar cost — the dimension the 5h Max cap actually
tracks.
| Token Type | Weight | Notes |
|---|
| input_tokens | 1.00× | base |
| output_tokens | 5.00× | |
| cache_read_input_tokens | 0.10× | |
cache_creation.ephemeral_5m_input_tokens | 1.25× | default cache TTL |
cache_creation.ephemeral_1h_input_tokens | 2.00× | extended cache TTL |
cache_creation_input_tokens (legacy) | 1.25× | fallback when no TTL breakdown |
The per-TTL breakdown landed in newer Claude Code builds; older jsonl
entries that only carry the flat cache_creation_input_tokens field are
weighted at 1.25× (the historical default and a safe lower bound). When
both fields are present in the same entry, the breakdown supersedes the
legacy field to avoid double-counting.
Calibration
The calibrated limit is auto-learned from real Anthropic API errors:
- Every time
budget_check.py runs, it inspects each in-window jsonl entry
for the structural API-error signature:
type=system, subtype=api_error with HTTP status=429, or any nested
error.type containing rate_limit / usage_limit.
- When it finds a new event, it takes the weighted token total at that
moment as a real-world
100% reading.
- The stored limit is EWMA-merged with the observation (default α=0.35) and
written to
~/.claude/.budget_calibration.json.
Why structural matching, not text?
An earlier version regex-matched "rate limit" / "limit reached" in the
raw jsonl line. That picked up any user/assistant message body that
mentioned the topic — including conversations debugging this very tool —
and produced a self-poisoning EWMA loop that drove the calibrated limit
from 63M down to 16M, causing false 100% BLOCKING. Structural signature
matching eliminates that class of false positive.