Search everything...

Stats

Actions

Available In

autoresearch-x

Name: autoresearch-x
Author: waynejing995

By waynejing995

Iterate anything. Prove everything. Autonomous iteration engine with evidence-chain tracking for optimization, debugging, and investigation workflows.

npx claudepluginhub waynejing995/autoresearch-x --plugin autoresearch-x

Popularity

Stars

Above avg

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents4

evaluator

/evaluator

Runs evaluation commands, extracts metrics, and reports raw results for autoresearch-x iteration loops. Does NOT see what code was changed — isolation prevents rationalization of results.

reviewer

/reviewer

Reviews draft program.md for autoresearch-x runs before user approval. Validates target clarity, feasibility, eval rules, and anti-patterns. Explores the codebase to verify claims. Returns structured review. Does NOT modify any files — read-only verification.

strategist

/strategist

Mind explosion agent for autoresearch-x. Invoked when ALL branches are stalled (5+ consecutive discards each). Analyzes failure patterns across all branches, researches externally for novel approaches, and proposes fundamentally new strategies + optional program.md revisions. Read-only — cannot modify code or run evaluations.

worker

/worker

Executes code modifications, adds instrumentation, gathers data, and runs scripts for autoresearch-x iteration loops. Handles the "do the work" side of each iteration. Does NOT see evaluation results — isolation prevents confirmation bias.

Skills1

autoresearch-x

/autoresearch-x

Iterate anything. Prove everything. Autonomous iteration engine with evidence-chain tracking. Use when: "iterate on X", "optimize X until Y", "autoresearch", "auto-research", "run experiment loop", "debug X systematically", "investigate X", "analyze X with checklist", "loop until fixed", "overnight experiment", "keep trying until it works", or any task requiring autonomous iteration with structured tracking. Supports three modes: optimize (metric-driven code iteration), debug (phased diagnosis with hypothesis tracking), investigate (evidence-based research/analysis). Inspired by karpathy/autoresearch — generalized to any domain.

Hooks1

Event Hooks

Bash

File writes

4 hooks across 3 events

Stats

Version2.1.0

LanguagePython

Stars2

MaintenanceExcellent

LicenseMIT

Last CommitApr 3, 2026

AddedMar 28, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

autoresearch-x2

Safety Signals

Caution

Executes bash commands

Hook triggers when Bash tool is used

Modifies files

Hook triggers on file write and edit operations

README

autoresearch-x

Iterate anything. Prove everything.

An autonomous iteration engine for Claude Code with evidence-chain tracking. Set up a tracked run and iterate autonomously — optimizing code, debugging failures, or investigating questions — until the target is met or the budget is exhausted.

Inspired by karpathy/autoresearch — generalized to any domain.

Why

Most AI coding agents try one fix, check if it worked, and move on. Real engineering problems require structured iteration: controlled experiments, hypothesis tracking, metric-driven decisions, and clear evidence chains.

autoresearch-x enforces scientific rigor:

One change per iteration — isolate variables so you know what actually worked
Separated execution and evaluation — the agent that writes the fix cannot judge if it worked (prevents confirmation bias)
Evidence chains — every decision is backed by data, tracked in results.tsv
Guardrail hooks — scope guards, iteration gates, and completion checks enforce discipline automatically

Three Modes

Mode	Goal	Phases	Use When
optimize	Improve a metric	baseline → iterate	Reducing latency, improving throughput, lowering error rates
debug	Fix a failure	observe → diagnose → fix	Systematic debugging with hypothesis tracking
investigate	Answer questions	gather → analyze → conclude	Log analysis, root cause investigation, evidence-based research

Architecture

Main Agent (orchestrator)
├── Worker subagent      — executes code changes (cannot see eval results)
├── Evaluator subagent   — runs eval commands (cannot see code changes)
├── Reviewer subagent    — validates program.md before runs start
└── Strategist subagent  — dispatched after 5 consecutive discards to pivot strategy

Guardrail Hooks
├── scope-guard          — blocks edits outside declared scope
├── iteration-gate       — enforces one-change-per-iteration rule
├── eval-bypass-detector — catches attempts to run eval directly
└── completion-check     — blocks stopping without a final report

Installation

Claude Code (Recommended)

# Add as a marketplace, then install
claude plugin marketplace add --source github --repo waynejing995/autoresearch-x
claude plugin install autoresearch-x@autoresearch-x

# Or clone manually
git clone https://github.com/waynejing995/autoresearch-x.git ~/.claude/plugins/autoresearch-x

Codex

git clone https://github.com/waynejing995/autoresearch-x.git ~/.codex/skills/autoresearch-x

Other AI Tools

Clone into your tool's plugin directory:

git clone https://github.com/waynejing995/autoresearch-x.git <your-plugin-dir>/autoresearch-x

Update

cd ~/.claude/plugins/autoresearch-x && git pull

Usage

Once installed, invoke via natural language or slash command:

# Interactive setup (guided questions)
/autoresearch-x

# Start from a template
/autoresearch-x --template optimize
/autoresearch-x --template debug
/autoresearch-x --template investigate

# Use an existing program.md
/autoresearch-x --program path/to/program.md

# Resume a previous run
/autoresearch-x resume <tag>

# Check status
/autoresearch-x status

Or just describe your task naturally:

"My API endpoint is responding in 400ms, I need it under 200ms. The benchmark is at scripts/bench.py. Iterate on it overnight."

"test_auth keeps failing with 403 but only on Mondays. Debug it systematically."

"Investigate why webhook processing has been unreliable this month. Check logs, analyze patterns, give me an evidence-backed report."

The 9-Step Protocol

Every iteration follows this exact protocol:

Review & Plan — analyze previous results, state ONE specific change to try
Dispatch Worker — send the change to the worker subagent
Review Diff — verify scope compliance and single-change rule
Commit — git commit -m "iter N: <description>"
Dispatch Evaluator — send eval command to the evaluator subagent
Decide — keep (metric improved) or discard (metric same/worse)
Act — advance branch or revert to previous commit
Record — append row to results.tsv
Write Detail — create iterations/<commit>.md, update report.md

Example Runs

Optimize: API Latency

results.tsv:
2026-03-23T10:01  a1b2c3d  baseline  keep     312  baseline: 312ms
2026-03-23T10:07  b2c3d4e  iterate   keep     245  added connection pooling
2026-03-23T10:14  c3d4e5f  iterate   discard  251  tried async handlers — worse
2026-03-23T10:20  d4e5f6g  iterate   keep     189  query result caching — target met!

Debug: Auth Failure

View full README on GitHub

autoresearch-x

Popularity

What's Inside

Confidence

README

autoresearch-x

Why

Three Modes

Architecture

Installation

Claude Code (Recommended)

Codex

Other AI Tools

Update

Usage

The 9-Step Protocol

Example Runs

Optimize: API Latency

Debug: Auth Failure

Similar Plugins

autoresearch

autoresearch-agent

scientific-method

bug-detective

researcher

superpowers-plus

autoresearch-x

Why

Three Modes

Architecture

Installation

Claude Code (Recommended)

Codex

Other AI Tools

Update

Usage

The 9-Step Protocol

Example Runs

Optimize: API Latency

Debug: Auth Failure

Popularity

Health & Quality

Similar Plugins

autoresearch

autoresearch-agent

scientific-method

bug-detective

researcher

superpowers-plus