Stats

Actions

Available In

Tags

Autoresearch

Turn Claude Code, OpenCode, or OpenAI Codex into a relentless improvement engine.

Based on Karpathy's autoresearch — constraint + mechanical metric + autonomous iteration = compounding gains.

"Set the GOAL → The agent runs the LOOP → You wake up to results"

You don't need AGI. You need a goal, a metric, and a loop that never quits.

Now supports Claude Code, OpenCode, and OpenAI Codex.

PLAN LOOP DEBUG FIX SECURE SHIP ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Goal │ │ Modify │ │ Find │ │ Fix │ │ STRIDE │ │ Stage │ │ Metric │────▶│ Verify │────▶│ Bugs │────▶│ Errors │────▶│ OWASP │────▶│ Deploy │ │ Scope │ │ Keep/ │ │ Trace │ │ Repair │ │ Red │ │ Release │ └──────────┘ │ Discard │ └──────────┘ └──────────┘ │ Team │ └──────────┘ /autoresearch: └──────────┘ /autoresearch: /autoresearch: └──────────┘ /autoresearch: plan /autoresearch debug fix /autoresearch: ship security ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Scenario │ │ Predict │ │ Learn │ │ Reason │ │ Edge │ │ 5-Expert │ │ Docs │ │ Debate │ │ Cases │ │ Swarm │ │ Gen │ │ Converge │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ /autoresearch: /autoresearch: /autoresearch: /autoresearch: scenario predict learn reason

LOOP (FOREVER or N times): 1. Review current state + git history + results log 2. Pick the next change (based on what worked, what failed, what's untried) 3. Make ONE focused change 4. Git commit (before verification) 5. Run mechanical verification (tests, benchmarks, scores) 6. If improved → keep. If worse → git revert. If crashed → fix or skip. 7. Log the result 8. Repeat. Never stop until you interrupt (or N iterations complete).

Autoresearch

Turn Claude Code, OpenCode, or OpenAI Codex into a relentless improvement engine.

Based on Karpathy's autoresearch — constraint + mechanical metric + autonomous iteration = compounding gains.

"Set the GOAL → The agent runs the LOOP → You wake up to results"

You don't need AGI. You need a goal, a metric, and a loop that never quits.

Now supports Claude Code, OpenCode, and OpenAI Codex.

How It Works · Commands · Quick Start · Guides · FAQ

      PLAN              LOOP             DEBUG              FIX            SECURE            SHIP
 ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
 │   Goal   │     │  Modify  │     │   Find   │     │   Fix    │     │  STRIDE  │     │  Stage   │
 │  Metric  │────▶│  Verify  │────▶│   Bugs   │────▶│  Errors  │────▶│  OWASP   │────▶│  Deploy  │
 │  Scope   │     │  Keep/   │     │  Trace   │     │  Repair  │     │  Red     │     │ Release  │
 └──────────┘     │  Discard │     └──────────┘     └──────────┘     │  Team    │     └──────────┘
/autoresearch:    └──────────┘    /autoresearch:    /autoresearch:   └──────────┘    /autoresearch:
  plan            /autoresearch     debug              fix          /autoresearch:      ship
                                                                     security

                  ┌──────────┐     ┌──────────┐     ┌──────────┐     ┌──────────┐
                  │ Scenario │     │ Predict  │     │  Learn   │     │  Reason  │
                  │   Edge   │     │ 5-Expert │     │   Docs   │     │  Debate  │
                  │   Cases  │     │  Swarm   │     │   Gen    │     │ Converge │
                  └──────────┘     └──────────┘     └──────────┘     └──────────┘
                 /autoresearch:   /autoresearch:   /autoresearch:   /autoresearch:
                   scenario         predict           learn           reason

Why This Exists

Karpathy's autoresearch demonstrated that a 630-line Python script could autonomously improve ML models overnight — 100 experiments per night — by following simple principles: one metric, constrained scope, fast verification, automatic rollback, git as memory.

Claude Autoresearch generalizes these principles to ANY domain. Not just ML — code, content, marketing, sales, HR, DevOps, or anything with a number you can measure.

How It Works

LOOP (FOREVER or N times):
  1. Review current state + git history + results log
  2. Pick the next change (based on what worked, what failed, what's untried)
  3. Make ONE focused change
  4. Git commit (before verification)
  5. Run mechanical verification (tests, benchmarks, scores)
  6. If improved → keep. If worse → git revert. If crashed → fix or skip.
  7. Log the result
  8. Repeat. Never stop until you interrupt (or N iterations complete).

Every improvement stacks. Every failure auto-reverts. Progress is logged in TSV format.

The Setup Phase

Before looping, Claude performs a one-time setup:

Read context — reads all in-scope files
Define goal — extracts or asks for a mechanical metric
Define scope — which files can be modified vs read-only
Establish baseline — runs verification on current state (iteration #0)
Confirm and go — shows setup, then begins the loop

autoresearch

Popularity

What's Inside

Confidence

README

Autoresearch

Why This Exists

How It Works

The Setup Phase

8 Critical Rules

Similar Plugins

fullstack-dev-skills

cc-polymath

ecc

octo

nextjs-expert

dotnet-skills

More by zhengxuyu

zforge

superpowers

Autoresearch

Why This Exists

How It Works

The Setup Phase

8 Critical Rules

Popularity

Health & Quality

More by zhengxuyu

zforge

superpowers

Similar Plugins

fullstack-dev-skills

cc-polymath

ecc

octo

nextjs-expert

dotnet-skills