Universal Autoresearch Plugin

Autonomous iterative experimentation with LLM-driven evaluation for Claude Code.

Inspired by karpathy/autoresearch — generalized from LLM training optimization to any measurable goal.

Key differences from autoresearch

General purpose — not ML-specific. Optimize bundle size, reduce latency, improve test coverage, refactor code quality — anything with observable results.
LLM-driven evaluation — a blind-review sub-agent judges each experiment instead of fixed numeric metrics. The evaluator sees only the criteria and results, never the code changes.
Structured experiment journal — persistent context across iterations so the agent learns from its own history.
Configurable termination — max iterations, consecutive failure limit, or goal-achieved detection.

How it works

┌─────────────────────────────────────────────────┐
│                 Experiment Loop                   │
│                                                   │
│  1. Read config + journal                         │
│  2. Formulate hypothesis                          │
│  3. Modify code → git commit                      │
│  4. Run experiment → collect results              │
│  5. Evaluation sub-agent (blind review)           │
│     ├─ keep → branch advances                     │
│     └─ discard → git reset                        │
│  6. Update journal + results.tsv                  │
│  7. Stop hook re-injects prompt → next iteration  │
└─────────────────────────────────────────────────┘

The loop is driven by a stop hook: when Claude finishes a turn, the hook intercepts the exit, increments the iteration counter, and re-injects the experiment prompt — keeping the agent in an autonomous loop until a termination condition is met.

Install

claude plugins add namjug-kim/universal-autoresearch-claude-plugin

Quick start

# With inline goal
/autoresearch "Optimize bundle size" --max-iterations 30

# With config file
/autoresearch --config ./autoresearch.config.md

# Cancel at any time
/cancel-autoresearch

Config file format

The config file uses Markdown with YAML frontmatter:

---
name: "Bundle Size Optimization"
max_iterations: 30
max_consecutive_failures: 5
---

## Goal

Minimize the web application bundle size while maintaining all existing functionality.

## Evaluation Criteria

- Bundle size (KB) should decrease compared to the current best
- ALL existing tests must pass (`pnpm test --run`)
- No functionality may be removed or degraded

## Execution Method

Run these commands to collect experiment results:

1. `pnpm build 2>&1` — Build the application
2. `du -sh dist/` — Measure total bundle size
3. `pnpm test --run 2>&1 | tail -20` — Run tests

## Constraints

- Do not add new external dependencies
- Do not modify test files

See templates/config.example.md for a full example.

Files created in your project

File	Purpose	Git tracked?
`autoresearch.config.md`	Goal + evaluation criteria	Yes
`experiment-journal.md`	Structured experiment history	No
`results.tsv`	Quick metrics log	No
`run.log`	Latest experiment output	No
`.claude/autoresearch.local.md`	Loop state (internal)	No

Options

Option	Default	Description
`--config <path>`	`./autoresearch.config.md`	Path to config file
`--max-iterations <n>`	unlimited	Stop after N experiments
`--max-consecutive-failures <n>`	5	Stop after N consecutive discards

Architecture

The plugin uses the same stop hook pattern as ralph-loop: a state file + stop hook + prompt template.

commands/
  autoresearch.md          # /autoresearch slash command
  cancel-autoresearch.md   # /cancel-autoresearch slash command
  help.md                  # /autoresearch-help slash command
hooks/
  hooks.json               # Registers the stop hook
  stop-hook.sh             # Loop controller: checks termination, re-injects prompt
scripts/
  setup-autoresearch.sh    # Parses CLI args, creates state file, initializes journal
templates/
  loop-prompt.md           # Core experiment iteration instructions
  eval-prompt.md           # Blind-review evaluation sub-agent prompt
  config.example.md        # Example config file

State file (.claude/autoresearch.local.md) tracks iteration count, consecutive failures, and session ID. The stop hook reads this file on each turn exit to decide whether to continue, terminate, or re-inject the loop prompt.

Session isolation: the state file records the session ID at creation. The stop hook only activates for the session that started the loop, so other Claude Code sessions in the same project are unaffected.

universal-autoresearch

Popularity

What's Inside

README