By waynejing995
Iterate anything. Prove everything. Autonomous iteration engine with evidence-chain tracking for optimization, debugging, and investigation workflows.
Runs evaluation commands, extracts metrics, and reports raw results for autoresearch-x iteration loops. Does NOT see what code was changed — isolation prevents rationalization of results.
Reviews draft program.md for autoresearch-x runs before user approval. Validates target clarity, feasibility, eval rules, and anti-patterns. Explores the codebase to verify claims. Returns structured review. Does NOT modify any files — read-only verification.
Mind explosion agent for autoresearch-x. Invoked when ALL branches are stalled (5+ consecutive discards each). Analyzes failure patterns across all branches, researches externally for novel approaches, and proposes fundamentally new strategies + optional program.md revisions. Read-only — cannot modify code or run evaluations.
Executes code modifications, adds instrumentation, gathers data, and runs scripts for autoresearch-x iteration loops. Handles the "do the work" side of each iteration. Does NOT see evaluation results — isolation prevents confirmation bias.
Executes bash commands
Hook triggers when Bash tool is used
Modifies files
Hook triggers on file write and edit operations
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
Iterate anything. Prove everything.
An autonomous iteration engine for Claude Code with evidence-chain tracking. Set up a tracked run and iterate autonomously — optimizing code, debugging failures, or investigating questions — until the target is met or the budget is exhausted.
Inspired by karpathy/autoresearch — generalized to any domain.
Most AI coding agents try one fix, check if it worked, and move on. Real engineering problems require structured iteration: controlled experiments, hypothesis tracking, metric-driven decisions, and clear evidence chains.
autoresearch-x enforces scientific rigor:
results.tsv| Mode | Goal | Phases | Use When |
|---|---|---|---|
| optimize | Improve a metric | baseline → iterate | Reducing latency, improving throughput, lowering error rates |
| debug | Fix a failure | observe → diagnose → fix | Systematic debugging with hypothesis tracking |
| investigate | Answer questions | gather → analyze → conclude | Log analysis, root cause investigation, evidence-based research |
Main Agent (orchestrator)
├── Worker subagent — executes code changes (cannot see eval results)
├── Evaluator subagent — runs eval commands (cannot see code changes)
├── Reviewer subagent — validates program.md before runs start
└── Strategist subagent — dispatched after 5 consecutive discards to pivot strategy
Guardrail Hooks
├── scope-guard — blocks edits outside declared scope
├── iteration-gate — enforces one-change-per-iteration rule
├── eval-bypass-detector — catches attempts to run eval directly
└── completion-check — blocks stopping without a final report
# Add as a marketplace, then install
claude plugin marketplace add --source github --repo waynejing995/autoresearch-x
claude plugin install autoresearch-x@autoresearch-x
# Or clone manually
git clone https://github.com/waynejing995/autoresearch-x.git ~/.claude/plugins/autoresearch-x
git clone https://github.com/waynejing995/autoresearch-x.git ~/.codex/skills/autoresearch-x
Clone into your tool's plugin directory:
git clone https://github.com/waynejing995/autoresearch-x.git <your-plugin-dir>/autoresearch-x
cd ~/.claude/plugins/autoresearch-x && git pull
Once installed, invoke via natural language or slash command:
# Interactive setup (guided questions)
/autoresearch-x
# Start from a template
/autoresearch-x --template optimize
/autoresearch-x --template debug
/autoresearch-x --template investigate
# Use an existing program.md
/autoresearch-x --program path/to/program.md
# Resume a previous run
/autoresearch-x resume <tag>
# Check status
/autoresearch-x status
Or just describe your task naturally:
"My API endpoint is responding in 400ms, I need it under 200ms. The benchmark is at scripts/bench.py. Iterate on it overnight."
"test_auth keeps failing with 403 but only on Mondays. Debug it systematically."
"Investigate why webhook processing has been unreliable this month. Check logs, analyze patterns, give me an evidence-backed report."
Every iteration follows this exact protocol:
git commit -m "iter N: <description>"results.tsviterations/<commit>.md, update report.mdresults.tsv:
2026-03-23T10:01 a1b2c3d baseline keep 312 baseline: 312ms
2026-03-23T10:07 b2c3d4e iterate keep 245 added connection pooling
2026-03-23T10:14 c3d4e5f iterate discard 251 tried async handlers — worse
2026-03-23T10:20 d4e5f6g iterate keep 189 query result caching — target met!
npx claudepluginhub waynejing995/autoresearch-x --plugin autoresearch-xAutonomous improvement engine for Claude Code. Runs an unbounded modify-verify-keep/discard loop against any mechanical metric. 10 subcommands: plan, debug, fix, security, ship, scenario, predict, learn, and reason.
Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
Use when debugging, investigating root causes, designing experiments, or performing scientific analysis — enforces hypothesis-driven reasoning, evidence-first observation, causality validation, and structured output templates. Use when facing unknowns, repeated failures, or complex investigations requiring rigorous methodology.
Debug issues systematically with root cause analysis and execution tracing
Autonomous experimentation skill — your AI coding agent designs experiments, tests hypotheses, discards failures, keeps wins. Runs overnight while you sleep.
Superpowers Plus core skills library for Claude Code: planning, execution routing, TDD, debugging, and collaboration workflows