From solo
Performs post-pipeline retrospectives: parses logs, counts productive vs wasted iterations, identifies failure patterns, scores runs, suggests fixes to skills/scripts.
How this skill is triggered — by the user, by Claude, or both
Slash command
/solo:retroThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.
This skill is self-contained — follow the phases below instead of delegating to other skills (/review, /audit, /build) or spawning Task subagents. Run all analysis directly.
Post-pipeline retrospective. Parses pipeline logs, counts productive vs wasted iterations, identifies recurring failure patterns, scores the pipeline run, and suggests concrete patches to skills/scripts to prevent the same failures next time.
git branch --show-current 2>/dev/nullgit log --oneline -10 2>/dev/nullgit diff --name-only HEAD~5..HEAD 2>/dev/null | head -20After a pipeline completes (or gets cancelled). This is the process quality check — /review checks code quality, /retro checks pipeline process quality.
Can also be used standalone on any project — with or without pipeline logs.
session_search(query) — find past pipeline runs and known issuescodegraph_explain(project) — understand project architecture contextcodegraph_query(query) — query code graph for project metadataIf MCP tools are not available, fall back to Glob + Grep + Read.
Detect project from $ARGUMENTS or CWD:
~/projects/my-app -> my-app)Find pipeline state file: .solo/pipelines/solo-pipeline-{project}.local.md (project-local) or ~/.solo/pipelines/solo-pipeline-{project}.local.md (global fallback)
project_root:Verify artifacts exist (parallel reads):
{project_root}/.solo/pipelines/pipeline.log{project_root}/.solo/pipelines/iter-*.log{project_root}/.solo/pipelines/progress.md{project_root}/docs/plan-done/{project_root}/docs/plan/Determine analysis mode:
Count iter logs (if they exist): ls {project_root}/.solo/pipelines/iter-*.log | wc -l
If no pipeline logs exist, the retro can still provide value by analyzing:
git log --oneline --since="1 week ago" — commit frequency, patterns, conventional formatgit log --oneline -- CLAUDE.md — how docs evolvedSkip Phases 2-4 and proceed directly to Phase 5 (Plan Fidelity) and Phase 6 (Git & Code Quality). Adjust Phase 7 scoring to weight available data more heavily.
Read pipeline.log in full. Parse line-by-line, extracting structured data from log tags:
Log format: [HH:MM:SS] TAG | message
Extract by tag:
| Tag | What to extract |
|---|---|
START | Pipeline run boundary — count restarts (multiple START lines = restarts) |
STAGE | iter N/M | stage S/T: {stage_id} — iteration count per stage |
SIGNAL | <solo:done/> or <solo:redo/> — which stages got completion signals |
INVOKE | Skill invoked — extract skill name, check for wrong names |
ITER | commit: {sha} | result: {stage complete|continuing} — per-iteration outcome |
CHECK | {stage} | {path} -> FOUND|NOT FOUND — marker file checks |
FINISH | Duration: {N}m — total duration per run |
MAXITER | Reached max iterations ({N}) — hit iteration ceiling |
QUEUE | Plan cycling events (activating, archiving) |
CIRCUIT | Circuit breaker triggered (if present) |
CWD | Working directory changes |
CTRL | Control signals (pause/stop/skip) |
Compute metrics:
total_runs = count of START lines
total_iterations = count of ITER lines
productive_iters = count of ITER lines with "stage complete"
wasted_iters = total_iterations - productive_iters
waste_pct = wasted_iters / total_iterations * 100
maxiter_hits = count of MAXITER lines
plan_cycles = count of QUEUE lines with "Cycling"
per_stage = {
stage_id: {
attempts: count of STAGE lines for this stage,
successes: count of ITER lines with "stage complete" for this stage,
waste_ratio: (attempts - successes) / attempts * 100,
}
}
Read progress.md and scan for error patterns:
Unknown skill: — extract which skill name was wrong<solo:done/><solo:done/> in same iteration -> minor noise (note but don't penalize)For each error pattern found, record:
Do NOT read all iter logs — could be 60+. Use smart sampling:
First failed iter per pattern: For each failure pattern found in Phase 3, read the first iter log that shows it
sed 's/\x1b\[[0-9;]*m//g' < iter-NNN-stage.log | head -100First successful iter per stage: For each stage that eventually succeeded, read the first successful iter log
<solo:done/> in the outputFinal review iter: Read the last iter-*-review.log (the verdict)
Extract from each sampled log:
Error, error, Unknown, failed)<solo:done/> or <solo:redo/> present?)For each track directory in docs/plan-done/ and docs/plan/:
Read spec.md (if exists):
- [ ] and - [x] checkboxescriteria_met = checked / total * 100Read plan.md (if exists):
- [ ] and - [x] checkboxes<!-- sha:... -->)tasks_done = checked / total * 100Compile per-track summary:
Quick checks only — NOT a full /review:
Commit count and format:
git -C {project_root} log --oneline | wc -l
git -C {project_root} log --oneline | head -30
feat:, fix:, chore:, test:, docs:, refactor:, build:, ci:, perf:)conventional_pct = conventional / total * 100Committer breakdown:
git -C {project_root} shortlog -sn --no-merges | head -10
Test status (if test command exists in CLAUDE.md or package.json):
Build status (if build command exists):
Check for signs of context window problems during the pipeline run:
Iteration quality curve: Compare early iterations vs late iterations.
Observation masking usage: Check if scratch/ directory exists in project root.
Plan recitation evidence: In sampled iter logs, check if the agent re-read plan.md at task boundaries.
CLAUDE.md bloat: wc -c {project_root}/CLAUDE.md
40,000 chars: WARN — attention dilution likely
60,000 chars: RED — severe context budget pressure
Add findings to the report under ## Context Health:
## Context Health
- Iteration quality trend: {STABLE / DEGRADING / N/A}
- Observation masking: {USED / NOT USED / N/A}
- Plan recitation: {OBSERVED / ABSENT / N/A}
- CLAUDE.md size: {N} chars — {OK / WARN / BLOATED}
Load scoring rubric from ${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md.
If plugin root not available, use the embedded weights:
Scoring weights:
Note: In fallback mode (no pipeline logs), redistribute Efficiency and Stability weights to Fidelity, Quality, and Commits.
Generate report at {project_root}/docs/retro/{date}-retro.md:
# Pipeline Retro: {project} ({date})
## Overall Score: {N}/10
## Pipeline Efficiency
| Metric | Value | Rating |
|--------|-------|--------|
| Total iterations | {N} | |
| Productive iterations | {N} ({pct}%) | {emoji} |
| Wasted iterations | {N} ({pct}%) | {emoji} |
| Pipeline restarts | {N} | {emoji} |
| Max-iter hits | {N} | {emoji} |
| Total duration | {time} | {emoji} |
| Tracks completed | {N} | |
| Duration per track | {time/tracks} | {emoji} |
## Per-Stage Breakdown
| Stage | Attempts | Successes | Waste % | Notes |
|-------|----------|-----------|---------|-------|
| scaffold | | | | |
| setup | | | | |
| plan | | | | |
| build | | | | |
| deploy | | | | |
| review | | | | |
## Failure Patterns
### Pattern 1: {name}
- **Occurrences:** {N} iterations
- **Root cause:** {analysis}
- **Wasted:** {N} iterations
- **Fix:** {concrete suggestion with file reference}
### Pattern 2: ...
## Plan Fidelity
| Track | Criteria Met | Tasks Done | SHAs | Rating |
|-------|-------------|------------|------|--------|
| {track-id} | {N}% | {N}% | {yes/no} | {emoji} |
## Code Quality (Quick)
- **Tests:** {N} pass, {N} fail (or "not configured")
- **Build:** PASS / FAIL (or "not configured")
- **Commits:** {N} total, {pct}% conventional format
## Three-Axis Growth
| Axis | Score | Evidence |
|------|-------|----------|
| **Technical** (code, tools, architecture) | {0-10} | {what changed} |
| **Cognitive** (understanding, strategy, decisions) | {0-10} | {what improved} |
| **Process** (harness, skills, pipeline, docs) | {0-10} | {what evolved} |
If only one axis is served — note what's missing.
## Recommendations
1. **[CRITICAL]** {patch suggestion with file:line reference}
2. **[HIGH]** {improvement}
3. **[MEDIUM]** {optimization}
4. **[LOW]** {nice-to-have}
## Suggested Patches
### Patch 1: {file} — {description}
**What:** {one-line description}
**Why:** {root cause reference from Failure Patterns}
\```diff
- old line
+ new line
\```
Rating guide (use these emojis):
After generating the report:
Show summary to user: overall score, top 3 failure patterns, top 3 recommendations
For each suggested patch (if any), use AskUserQuestion:
If "Show diff first": display the full diff, then ask again (Apply / Skip)
If "Apply": use Edit tool to apply the change directly
After all patches processed:
fix(retro): {description}After patching, revise the project's CLAUDE.md to keep it lean and useful for future agents.
wc -c CLAUDE.mdgit add CLAUDE.md && git commit -m "docs: revise CLAUDE.md (post-retro)"Run this phase only if ${CLAUDE_PLUGIN_ROOT} is available (i.e., solo-factory is installed). Skip if running as a standalone skill without the factory context.
After evaluating the project pipeline, step back and evaluate the factory itself — the skills, scripts, and pipeline logic that produced this result. Be a harsh critic.
Read the skills that were invoked in this pipeline run (from INVOKE lines in pipeline.log):
${CLAUDE_PLUGIN_ROOT}/skills/{stage}/SKILL.mdRead pipeline script signal handling and stage logic:
${CLAUDE_PLUGIN_ROOT}/scripts/solo-dev.shCross-reference with failure patterns from Phase 3:
Factory Score: {N}/10
Skill quality:
- {skill}: {score}/10 — {why}
- {skill}: {score}/10 — {why}
Pipeline reliability: {N}/10 — {why}
Missing capabilities:
- {what the factory couldn't do that it should have}
Top factory defects:
1. {defect} → {which file to fix} → {concrete fix}
2. {defect} → {which file to fix} → {concrete fix}
After scoring the factory, step back further and think about the harness — the entire system that guides agents (CLAUDE.md, docs/, linters, skills, templates). Ask:
Context engineering: Did the agent have everything it needed in-repo? Or did it struggle because knowledge was missing / scattered / stale?
docs/ or CLAUDE.mdArchitectural constraints: Did the agent break module boundaries, produce inconsistent patterns, or ignore conventions?
Decision traces: What worked well that future agents should reuse? What failed that they should avoid?
Skill gaps: Which skills need better instructions? Which new skills should exist?
Append findings to {project_root}/docs/evolution.md (create if not exists). If ~/.solo/evolution.md exists, append there as well for cross-project tracking.
## {YYYY-MM-DD} | {project} | Factory Score: {N}/10
Pipeline: {stages run} | Iters: {total} | Waste: {pct}%
### Defects
- **{severity}** | {skill/script}: {description}
- Fix: {concrete file:change}
### Harness Gaps
- **Context:** {what knowledge was missing or stale for the agent}
- **Constraints:** {what boundary violations or inconsistencies occurred}
- **Precedents:** {patterns worth capturing for future agents — good or bad}
### Missing
- {capability the factory lacked}
### What worked well
- {skill/pattern that performed efficiently}
Rules:
Output signal: <solo:done/>
Important: /retro always outputs <solo:done/> — it never needs redo. Even if pipeline was terrible, the retro itself always completes.
${CLAUDE_PLUGIN_ROOT}/skills/retro/references/eval-dimensions.md — scoring rubric (8 axes, weights)${CLAUDE_PLUGIN_ROOT}/skills/retro/references/failure-catalog.md — known failure patterns and fixesnpx claudepluginhub fortunto2/solo-factory --plugin soloRuns context-aware retrospectives auto-gathering git metrics, learnings, away-logs, and handoffs into pre-populated tables for interactive or agent-summary review.
Orchestrates an adversarial plan-implement-review pipeline by spawning agents with separate context windows. Use after intake skills produce a starting document.
Audits Claude Code session logs for PBR workflow compliance (STATE.md updates, hooks, commits, skills) and UX quality (flows, friction, expectations). Supports date ranges and modes.