From autoresearch
Analyze results from an autoresearch experiment loop. This skill should be used when the user asks to "analyze experiment results", "show research progress", "summarize experiments", "what improved", "experiment statistics", or needs to review and visualize the results from results.tsv.
How this skill is triggered — by the user, by Claude, or both
Slash command
/autoresearch:analyze-resultsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze results.tsv from autoresearch experiment loops to extract insights,
Analyze results.tsv from autoresearch experiment loops to extract insights, identify patterns, and suggest next experiment directions.
Read results.tsv (or path from experiment-config.yaml results_file field).
Parse tab-separated columns: commit, metric, status, description.
Calculate and report:
List all "keep" experiments in order, showing cumulative improvement:
# | commit | metric | delta | description
1 | a1b2c3d | 0.9500 | baseline | initial baseline
2 | b2c3d4e | 0.9320 | -0.0180 | increase learning rate
3 | e5f6g7h | 0.9210 | -0.0110 | reduce model depth
Identify patterns in discarded and crashed experiments:
Based on the analysis, suggest:
Present results as a structured report:
## Experiment Report: <tag>
### Summary
- Total: N experiments (K kept, D discarded, C crashed)
- Keep rate: X%
- Baseline: <value> -> Current best: <value> (Y% improvement)
### Progress
[Table of kept experiments with deltas]
### Top Improvements
[Ranked list of experiments by improvement magnitude]
### Failure Patterns
[Common failure modes]
### Recommended Next Steps
1. [Specific suggestion based on data]
2. [Another suggestion]
3. [Another suggestion]
Use the bundled analysis script for automated parsing:
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/parse-results.py results.tsv
This outputs JSON with summary statistics for programmatic use.
npx claudepluginhub hironow/dotfiles --plugin autoresearchSets up and runs autonomous experiment loops to optimize any target metric using git branches, autoresearch.md configs, bash benchmark scripts, and JSONL state logging. Activates on 'run autoresearch' or optimization loop requests.
Runs iterative experiments to optimize measurable metrics (speed, accuracy, config). Manages .lab/ directory for experiment history and autonomous workflow.
Standardizes training experiment tracking with per-experiment notebooks and a project-level index. Use when config changes between runs to keep results comparable.