Claude Code Session Analyzer
Quantitative analysis tool for Claude Code session logs. Measures thinking depth, tool usage patterns, behavioral regressions, user sentiment, and cost — replicating the methodology from anthropics/claude-code#42796.
What It Measures
| Category | Key Metrics |
|---|
| Thinking Depth | Redaction rate, signature-length proxy for thinking depth, Pearson correlation, time-of-day variation |
| Tool Usage | Read:Edit ratio, Research:Mutation ratio, Write % of mutations, edits-without-prior-Read %, repeated edits |
| Behavioral Signals | Reasoning loops, "simplest fix" mentions, premature stopping, self-admitted errors (all per 1K tool calls) |
| User Experience | Frustration indicators, user interrupts, positive:negative sentiment ratio, word frequency analysis |
| Cost | Token usage, API request counts, estimated Bedrock pricing, daily cost trend, cost per prompt |
| Period Comparison | Automatic split into halves with side-by-side metrics and change detection |
Quality Benchmarks
Derived from the stellaraccident analysis of 17,871 thinking blocks and 234,760 tool calls across 6,852 sessions:
| Metric | Good | Transition | Degraded |
|---|
| Read:Edit ratio | >6.0 | 2.0-6.0 | <2.0 |
| Write % of mutations | <5% | 5-10% | >10% |
| Edits without prior Read | <10% | 10-25% | >25% |
| Reasoning loops (per 1K TC) | <10 | 10-20 | >20 |
| Frustration indicators | <6% | 6-10% | >10% |
| Sentiment ratio (pos:neg) | >4:1 | 3-4:1 | <3:1 |
Usage
Option 1: Python Script (standalone)
No dependencies beyond Python 3.7+ standard library.
# Analyze all sessions from the last 90 days
python3 analyze_sessions.py
# Specify date range and output
python3 analyze_sessions.py --start 2026-03-01 --end 2026-04-01 --output march-report.md
# Analyze a specific sessions directory
python3 analyze_sessions.py ~/.claude/projects/my-project/ --output project-report.md
Arguments:
| Argument | Description | Default |
|---|
[path] | Directory containing .jsonl session files (recursive) | ~/.claude/projects/ |
--start | Start date (YYYY-MM-DD) | 90 days ago |
--end | End date (YYYY-MM-DD) | today |
--output | Output markdown file path | session-analysis-{date}.md |
Option 2: Install as Claude Code Plugin
This repo is structured as a Claude Code plugin. Install it directly from GitHub:
/plugin install session-analyzer from github:lucemia/claude-session-analyzer
Or add it to your ~/.claude/settings.json manually:
{
"enabledPlugins": {
"session-analyzer": true
}
}
Once installed, run inside Claude Code:
/analyze-sessions
/analyze-sessions ~/.claude/projects/ --start 2026-03-01 --end 2026-04-01
Option 3: Manual Slash Command Setup
Copy the command file into your Claude Code commands directory:
# User-level (available in all projects)
mkdir -p ~/.claude/commands
cp commands/analyze-sessions.md ~/.claude/commands/
# Or project-level (this project only)
mkdir -p .claude/commands
cp commands/analyze-sessions.md .claude/commands/
The slash command instructs Claude to perform the analysis interactively, using the same methodology but with the ability to adapt and explain findings in real time.
How It Works
Claude Code stores session logs as JSONL files in ~/.claude/projects/. Each line is a JSON object representing a message in the conversation:
- Assistant messages contain
thinking blocks (with signature field as depth proxy), tool_use blocks, and text output
- User messages contain the prompt text
- Usage records contain token counts (input, output, cache read, cache creation)
The analyzer parses these files and computes:
-
Thinking depth — The signature field on thinking blocks correlates with thinking content length (typically r > 0.95). When thinking is redacted (empty string), signature length serves as a proxy for how deeply the model reasoned.
-
Read:Edit ratio — The number of file reads per file edit. When the model thinks deeply, it reads 6+ files before each edit (research-first). When thinking is shallow, it drops to 1-2 reads per edit (edit-first), producing lower quality changes.
-
Behavioral signals — Text patterns in model output that indicate shallow reasoning: self-corrections ("oh wait", "actually"), premature stopping ("should I continue?"), ownership dodging ("not caused by my changes"), and the word "simplest" appearing when the model takes shortcuts.
-
User sentiment — Word frequencies and sentiment ratios in user prompts that shift measurably when model quality degrades: less "great"/"thanks", more "stop"/"wrong"/"read the file first".
Sample Output