Universal Autoresearch Plugin
Autonomous iterative experimentation with LLM-driven evaluation for Claude Code.
Inspired by karpathy/autoresearch — generalized from LLM training optimization to any measurable goal.
Key differences from autoresearch
- General purpose — not ML-specific. Optimize bundle size, reduce latency, improve test coverage, refactor code quality — anything with observable results.
- LLM-driven evaluation — a blind-review sub-agent judges each experiment instead of fixed numeric metrics. The evaluator sees only the criteria and results, never the code changes.
- Structured experiment journal — persistent context across iterations so the agent learns from its own history.
- Configurable termination — max iterations, consecutive failure limit, or goal-achieved detection.
How it works
┌─────────────────────────────────────────────────┐
│ Experiment Loop │
│ │
│ 1. Read config + journal │
│ 2. Formulate hypothesis │
│ 3. Modify code → git commit │
│ 4. Run experiment → collect results │
│ 5. Evaluation sub-agent (blind review) │
│ ├─ keep → branch advances │
│ └─ discard → git reset │
│ 6. Update journal + results.tsv │
│ 7. Stop hook re-injects prompt → next iteration │
└─────────────────────────────────────────────────┘
The loop is driven by a stop hook: when Claude finishes a turn, the hook intercepts the exit, increments the iteration counter, and re-injects the experiment prompt — keeping the agent in an autonomous loop until a termination condition is met.
Install
claude plugins add namjug-kim/universal-autoresearch-claude-plugin
Quick start
# With inline goal
/autoresearch "Optimize bundle size" --max-iterations 30
# With config file
/autoresearch --config ./autoresearch.config.md
# Cancel at any time
/cancel-autoresearch
Config file format
The config file uses Markdown with YAML frontmatter:
---
name: "Bundle Size Optimization"
max_iterations: 30
max_consecutive_failures: 5
---
## Goal
Minimize the web application bundle size while maintaining all existing functionality.
## Evaluation Criteria
- Bundle size (KB) should decrease compared to the current best
- ALL existing tests must pass (`pnpm test --run`)
- No functionality may be removed or degraded
## Execution Method
Run these commands to collect experiment results:
1. `pnpm build 2>&1` — Build the application
2. `du -sh dist/` — Measure total bundle size
3. `pnpm test --run 2>&1 | tail -20` — Run tests
## Constraints
- Do not add new external dependencies
- Do not modify test files
See templates/config.example.md for a full example.
Files created in your project
| File | Purpose | Git tracked? |
|---|
autoresearch.config.md | Goal + evaluation criteria | Yes |
experiment-journal.md | Structured experiment history | No |
results.tsv | Quick metrics log | No |
run.log | Latest experiment output | No |
.claude/autoresearch.local.md | Loop state (internal) | No |
Options
| Option | Default | Description |
|---|
--config <path> | ./autoresearch.config.md | Path to config file |
--max-iterations <n> | unlimited | Stop after N experiments |
--max-consecutive-failures <n> | 5 | Stop after N consecutive discards |
Architecture
The plugin uses the same stop hook pattern as ralph-loop: a state file + stop hook + prompt template.
commands/
autoresearch.md # /autoresearch slash command
cancel-autoresearch.md # /cancel-autoresearch slash command
help.md # /autoresearch-help slash command
hooks/
hooks.json # Registers the stop hook
stop-hook.sh # Loop controller: checks termination, re-injects prompt
scripts/
setup-autoresearch.sh # Parses CLI args, creates state file, initializes journal
templates/
loop-prompt.md # Core experiment iteration instructions
eval-prompt.md # Blind-review evaluation sub-agent prompt
config.example.md # Example config file
State file (.claude/autoresearch.local.md) tracks iteration count, consecutive failures, and session ID. The stop hook reads this file on each turn exit to decide whether to continue, terminate, or re-inject the loop prompt.
Session isolation: the state file records the session ID at creation. The stop hook only activates for the session that started the loop, so other Claude Code sessions in the same project are unaffected.