From traces
Analyze execution traces to cluster failure patterns and propose harness improvements (rules, hooks, skills)
How this skill is triggered — by the user, by Claude, or both
Slash command
/traces:trace-evolveThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Periodic harness evolution skill inspired by NeoSigma's self-improving loop (39.3% improvement from failure mining + clustering + gated changes). Run weekly or when you suspect recurring patterns.
Periodic harness evolution skill inspired by NeoSigma's self-improving loop (39.3% improvement from failure mining + clustering + gated changes). Run weekly or when you suspect recurring patterns.
This skill analyzes and proposes only. It does NOT modify harness files.
ls -t ~/.claude/traces/*.jsonl | head -14 (last ~2 weeks)exit_code != "0" or output contains error keywords (Error, Exception, FATAL, failed, denied, not found)Group failures by root cause mechanism, not individual error messages:
| Cluster Type | Signal | Example |
|---|---|---|
| Tool misuse | Same command pattern fails repeatedly | git push blocked 5 times |
| Stale context | Edit failures after many edits without reads | 3+ edit-without-read sequences |
| Environment | Missing binary or wrong path | command not found patterns |
| Test regression | Same test fails across sessions | test_X fails in 4/7 sessions |
| Permission | Blocked by hooks or denied by user | BLOCKED: in output |
| Workflow | Repeated manual corrections after agent actions | Reverts, re-dos |
Prioritize clusters by: total_failures (high) and sessions_affected (high).
For each cluster (top 5 max), propose ONE of:
rules.d/ rule — if the failure is a behavioral pattern (agent keeps doing X when it shouldn't)For each proposal, include:
## Harness Evolution Report
**Period:** [date range]
**Sessions analyzed:** N
**Total trace entries:** N
**Failures found:** N (N% error rate)
---
### Cluster 1: [descriptive name]
- **Type:** [tool misuse | stale context | environment | test regression | permission | workflow]
- **Frequency:** N occurrences across M sessions
- **Example traces:**
[2-3 representative trace entries]
- **Root cause:** [mechanism explanation]
- **Proposed change:** [specific suggestion with file path]
- **Token impact:** [estimated additional chars/message if rules.d/ change]
- **Regression risk:** [low | medium | high] — [why]
### Cluster 2: ...
---
### Summary
| Metric | Value |
|--------|-------|
| Clusters found | N |
| Proposed rule changes | N |
| Proposed hook changes | N |
| Proposed skill changes | N |
| No action needed | N |
### Next Steps
- [prioritized list of what to implement first]
- [suggested re-analysis date]
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub artmin96/forge-studio --plugin traces