Iterate anything. Prove everything. Autonomous iteration engine with evidence-chain tracking.
npx claudepluginhub waynejing995/autoresearch-xIterate anything. Prove everything. Autonomous iteration engine with evidence-chain tracking for optimization, debugging, and investigation workflows.
Iterate anything. Prove everything.
An autonomous iteration engine for Claude Code with evidence-chain tracking. Set up a tracked run and iterate autonomously — optimizing code, debugging failures, or investigating questions — until the target is met or the budget is exhausted.
Inspired by karpathy/autoresearch — generalized to any domain.
Most AI coding agents try one fix, check if it worked, and move on. Real engineering problems require structured iteration: controlled experiments, hypothesis tracking, metric-driven decisions, and clear evidence chains.
autoresearch-x enforces scientific rigor:
results.tsv| Mode | Goal | Phases | Use When |
|---|---|---|---|
| optimize | Improve a metric | baseline → iterate | Reducing latency, improving throughput, lowering error rates |
| debug | Fix a failure | observe → diagnose → fix | Systematic debugging with hypothesis tracking |
| investigate | Answer questions | gather → analyze → conclude | Log analysis, root cause investigation, evidence-based research |
Main Agent (orchestrator)
├── Worker subagent — executes code changes (cannot see eval results)
├── Evaluator subagent — runs eval commands (cannot see code changes)
├── Reviewer subagent — validates program.md before runs start
└── Strategist subagent — dispatched after 5 consecutive discards to pivot strategy
Guardrail Hooks
├── scope-guard — blocks edits outside declared scope
├── iteration-gate — enforces one-change-per-iteration rule
├── eval-bypass-detector — catches attempts to run eval directly
└── completion-check — blocks stopping without a final report
# Add as a marketplace, then install
claude plugin marketplace add --source github --repo waynejing995/autoresearch-x
claude plugin install autoresearch-x@autoresearch-x
# Or clone manually
git clone https://github.com/waynejing995/autoresearch-x.git ~/.claude/plugins/autoresearch-x
git clone https://github.com/waynejing995/autoresearch-x.git ~/.codex/skills/autoresearch-x
Clone into your tool's plugin directory:
git clone https://github.com/waynejing995/autoresearch-x.git <your-plugin-dir>/autoresearch-x
cd ~/.claude/plugins/autoresearch-x && git pull
Once installed, invoke via natural language or slash command:
# Interactive setup (guided questions)
/autoresearch-x
# Start from a template
/autoresearch-x --template optimize
/autoresearch-x --template debug
/autoresearch-x --template investigate
# Use an existing program.md
/autoresearch-x --program path/to/program.md
# Resume a previous run
/autoresearch-x resume <tag>
# Check status
/autoresearch-x status
Or just describe your task naturally:
"My API endpoint is responding in 400ms, I need it under 200ms. The benchmark is at scripts/bench.py. Iterate on it overnight."
"test_auth keeps failing with 403 but only on Mondays. Debug it systematically."
"Investigate why webhook processing has been unreliable this month. Check logs, analyze patterns, give me an evidence-backed report."
Every iteration follows this exact protocol:
git commit -m "iter N: <description>"results.tsviterations/<commit>.md, update report.mdresults.tsv:
2026-03-23T10:01 a1b2c3d baseline keep 312 baseline: 312ms
2026-03-23T10:07 b2c3d4e iterate keep 245 added connection pooling
2026-03-23T10:14 c3d4e5f iterate discard 251 tried async handlers — worse
2026-03-23T10:20 d4e5f6g iterate keep 189 query result caching — target met!
Development marketplace for Superpowers core skills library
Harness-native ECC skills, hooks, rules, MCP conventions, and operator workflows
Open Design — local-first design app exposed to coding agents over MCP. Install once with your agent's plugin command and projects/files/skills are reachable through stdio.