Skill

session-health

Use when a long session is degrading, approaching the effective context window, looping on the same tool call, contradicting earlier decisions, or when you want to surface what went wrong at the end of a session. Fires on two timing layers — per-turn via UserPromptSubmit and session-end via Stop — over a shared signal library. Trigger phrases: "/session-health", "context is full", "I'm degrading", "stuck in a loop", "monitor context", "what went wrong", "the model got dumber", "detect failure patterns", "ambient diagnostics". Use this skill liberally on any multi-hour session — install once and it monitors silently, surfacing exactly one alert per turn when a signal fires.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/catalyst:session-health

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Two-timing detector for long Claude Code sessions. Monitors degradation in real time

Supporting Files

evals/evals.jsonevals/evals.mdevals/fixtures/transcript-below-warn.jsonlevals/fixtures/transcript-clean.jsonlevals/fixtures/transcript-repeated-bash.jsonlevals/fixtures/transcript-stale-read.jsonlevals/fixtures/transcript-strong-effective.jsonlevals/fixtures/transcript-warn-effective.jsonl

SKILL.md

230 lines · ~2.9k tokens

Stats

LanguageShell

Stars1

MaintenanceExcellent

Last CommitJun 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

session-health

Two-timing detector for long Claude Code sessions. Monitors degradation in real time (per-turn, UserPromptSubmit) and audits failure patterns at the end (Stop), both over a shared POSIX bash + jq signal library. Merges session-degradation-watch (v0.6) and failure-pattern-detector (v0.5) into a single ambient skill.

Why this exists

Context rot is not a vibe. GPT-4o falls from a 99.3% baseline to 69.7% at just 32K tokens — well inside its 128K advertised window (NoLiMa, ICML 2025). RULER finds only half of 17 long-context models maintain satisfactory performance at 32K despite claiming 32K+ support (RULER, COLM 2024). The effective usable window is typically 50–70% of the advertised limit. Bigger windows don't fix this — the mechanism is an intrinsic attention budget that every token draws from (n² pairwise relationships) and a U-shaped positional bias that under-attends middle content (Lost in the Middle, TACL 2024).

The practical fix is harness-layer detection + structured reset. This skill detects; /catalyst:handoff reground resets. Neither step works alone.

Two-timing model

UserPromptSubmit hook  ──→  per-turn: 3 signals (context-pressure at 2 levels), single most-urgent alert
Stop hook             ──→  session-end: 6 failure patterns, recovery recipes
                            ↓
              hooks/lib/session-health-signals.sh  (shared signal library)

Both hooks are POSIX bash + jq only. They fail-open on infra errors (missing jq, missing lib). The shared library is sourced, not executed.

Per-turn signals (UserPromptSubmit hook)

Token source: the last assistant turn's .message.usage field (input_tokens + cache_read_input_tokens + cache_creation_input_tokens). When no usage is present (e.g. fresh session, or post-compaction turn without a usage field), the context signal is suppressed (falls through to other signals). A sanity ceiling suppresses the signal when the reported value exceeds advertised × 2 (guards against implausibly large values).

Urgency order — exactly ONE alert fires per turn (single-alert bar):

Priority	Signal	Threshold	Alert + recipe
1	context STRONG	≥ 0.70 × effective window	"Context critically full — run `/catalyst:handoff reground` NOW"
2	context WARN	≥ 0.50 × effective window	"Approaching effective context limit — run `/catalyst:handoff reground`"
3	stale-read	Edit on file F where last Read of F was >15 tool-use events ago	"Re-Read F before further edits to avoid old_string mismatch."
4	repeated-tool	Same tool call ×3 in last 5 turns	"Try a different approach (different command, different file, ask user)."

contradiction retired 2026-06-17 (brittle free-text match, low value) — see PRINCIPLES P4.

instruction-fade kept, low-priority; review next model pass.

Recalibrated effective-window thresholds

The old session-degradation-watch v0.6 triggered at raw percentages of the advertised window (warn at 60% = 120K tok for a 200K model). That was model-naïve. Research shows the effective usable window is ≈70% of advertised before quality degrades (8K–32K for frontier models per NoLiMa, even while claiming 128K+).

session-health v0.7 recalibrates to fractions of the effective window:

Level	Old trigger	New trigger	At advertised=200K
WARN	60% of advertised (120,000 tok)	0.50 × effective	70,000 tok
STRONG	85% of advertised (170,000 tok)	0.70 × effective	98,000 tok

Effective = advertised × CATALYST_SH_EFFECTIVE_FRAC (default 0.70). Alerts now fire at 35% and 49% of the advertised window — much earlier than before, matching where quality actually starts to slip.

Session-end failure patterns (Stop hook)

Scans the last CATALYST_SH_PATTERN_WINDOW tool events (default 100) of the transcript at session end. All 6 patterns from the OpenDev paper:

Pattern	Signal	Recovery recipe
`repeated-tool-call`	Same Bash/Read/Grep input ≥3× in last 5 turns	"Loop on '…'. Try different approach."
`edit-mismatch`	≥2 `old_string not found` errors in-window; names the failing file(s)	"Re-Read the file before next Edit."
`stale-read`	Edit on F where F was Written between last Read and this Edit	"Re-Read F — modified since last Read."
`recovery-spiral`	≥3 consecutive re-Reads of previously-seen files	"Run `/catalyst:handoff reground` or `/clear` + handoff Resume."
`instruction-fade`	Same first 80 chars of user message repeated ≥2× in last 10 turns	"Re-state instruction in fresh session (handoff RECOVER)."
`context-drowning`	Any tool_result content >10KB; names the producing tool + KB	"For next big read, dispatch a subagent instead of inlining."

All detections are appended to .claude/session-health.log with timestamp + session ID

pattern + recovery recipe.

Suggest-only rule

This skill suggests; it never auto-recovers. Issue #60248 showed that in-loop auto-recovery doesn't work — the hook fires in a context where the agent can't reliably act on a recursive invocation. The recipe names the exact next step; acting on it is the agent's choice.

The canonical degradation recovery is /catalyst:handoff reground — a read-only re-injection of the current brief's goal, locked decisions, and files-to-keep-in-view into the active context (no disk write). If no brief exists yet, run WRITE first to checkpoint state, then reground. Alternatively, /catalyst:handoff split forks a braided session into N self-contained briefs when the session has accumulated multiple interleaved threads. Degradation alerts recommend one or the other (suggest-only; which to use is the agent's choice).

Composition with Tier-1 hooks

UserPromptSubmit-orient.sh — injects repo orientation. The two hooks fire additively on UserPromptSubmit; Claude Code shows both context injections. Neither overwrites the other.
Stop-commit-backstop.sh — flags uncommitted changes at session end. Both Stop hooks fire independently; neither's systemMessage overwrites the other.
PreToolUse-verify-gate.sh — gate for evidence-first writes. Orthogonal; verify-gate denials are NOT counted as failure patterns.

Configuration

Per-turn hook config: .claude/session-health-watch.json

{
  "repeated_tool_call_count": 3,
  "repeated_tool_call_window_turns": 5,
  "stale_read_max_turns": 15,
  "log_path": ".claude/session-health.log"
}

Session-end hook config: .claude/session-health.json

{
  "enabled_patterns": [
    "repeated-tool-call", "edit-mismatch", "stale-read",
    "recovery-spiral", "instruction-fade", "context-drowning"
  ],
  "thresholds": {
    "repeated_tool_call_count": 3,
    "repeated_tool_call_window_turns": 5,
    "stale_read_max_turns": 15,
    "edit_mismatch_count": 2,
    "recovery_spiral_count": 3,
    "pattern_window": 100
  },
  "log_path": ".claude/session-health.log"
}

Disable a noisy pattern by removing it from enabled_patterns. Log paths must stay inside the project dir (enforced by the hook).

Environment variables (override thresholds globally):

Variable	Default	Meaning
`CATALYST_SH_ADVERTISED_TOKENS`	`200000`	Model's advertised context window in tokens
`CATALYST_SH_EFFECTIVE_FRAC`	`0.70`	Fraction of advertised window that is effective
`CATALYST_SH_WARN_FRAC`	`0.50`	Fraction of effective window for WARN alert
`CATALYST_SH_STRONG_FRAC`	`0.70`	Fraction of effective window for STRONG alert
`CATALYST_SH_PATTERN_WINDOW`	`100`	Tool-event window for Stop pattern matchers
`CATALYST_TIKTOKEN`	unset	Deprecated for context signal (no longer has effect); reserved for future use

Configuration — `.claude/catalyst.json`

All knobs resolve env var > .claude/catalyst.json > built-in default. The file is optional; absent means defaults. The session_health section:

Key	Env override	Default
`advertised_tokens`	`CATALYST_SH_ADVERTISED_TOKENS`	`200000`
`effective_frac`	`CATALYST_SH_EFFECTIVE_FRAC`	`0.70`
`warn_frac`	`CATALYST_SH_WARN_FRAC`	`0.50`
`strong_frac`	`CATALYST_SH_STRONG_FRAC`	`0.70`
`pattern_window`	`CATALYST_SH_PATTERN_WINDOW`	`100`

Example (1M-context model):

{ "session_health": { "advertised_tokens": 1000000 } }

Commands

Command	What it does
`/session-health install`	Install both hooks (UserPromptSubmit + Stop) into `.claude/settings.json`
`/session-health uninstall`	Remove both hooks
`/session-health status`	Print last 20 entries from `.claude/session-health.log`
`/session-health patterns`	List all 6 named patterns with current enabled/disabled state

Bad / good example

Bad — generic alert with no recipe:

CONTEXT WARN: context is getting full. Be careful.

A generic alert gets ignored. "Be careful" is not a next step.

Good — specific alert with exact recipe:

CONTEXT WARN: transcript is ~82,000 tokens (effective window 140,000 tok;
warn threshold 70,000 tok). Approaching the effective context limit —
run /catalyst:handoff reground to checkpoint progress.

Token count is read from the last assistant turn's .message.usage (input + cache_read + cache_creation). The agent can act on this immediately.

When NOT to use

Short sessions (<30 turns) — overhead doesn't pay back; hooks are inert but harmless.
CI / non-interactive runs — UserPromptSubmit never fires; Stop fires but only the pattern log matters.
Projects where false positives would distract — disable noisy patterns via enabled_patterns rather than uninstalling the whole skill.

Model evolution

Assumes the model doesn't natively know when it's degrading. The effective-window multiplier (0.70) reflects current frontier model behavior per NoLiMa/RULER (2025). If future Claude models ship native context-budget warnings or reduce the positional- bias effect substantially, the per-turn signal thresholds may be raised (less aggressive) or the UserPromptSubmit hook may become vestigial. The session-end pattern matchers depend only on transcript shape, not model capability — those are more durable. Review annually or when a new flagship model lands with credible long-context benchmark data.

session-health

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

session-health

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

session-health

Why this exists

Two-timing model

Per-turn signals (UserPromptSubmit hook)

Recalibrated effective-window thresholds

Session-end failure patterns (Stop hook)

Suggest-only rule

Composition with Tier-1 hooks

Configuration

Configuration — .claude/catalyst.json

Commands

Bad / good example

When NOT to use

Model evolution

Similar Skills

session-health

Why this exists

Two-timing model

Per-turn signals (UserPromptSubmit hook)

Recalibrated effective-window thresholds

Session-end failure patterns (Stop hook)

Suggest-only rule

Composition with Tier-1 hooks

Configuration

Configuration — .claude/catalyst.json

Commands

Bad / good example

When NOT to use

Model evolution

Similar Skills

Configuration — `.claude/catalyst.json`

Configuration — `.claude/catalyst.json`