Search everything...

Stats

Actions

Available In

autoresearch

Name: autoresearch
Author: wjgoarxiv

By wjgoarxiv

Autonomous research loops with 10 commands. Generalizes Karpathy's autoresearch loop to any domain with mechanical evaluation, overnight persistence, and zero dependencies.

npx claudepluginhub wjgoarxiv/autoresearch-skill

Popularity

Stars

Top 25%

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Skills9

autoresearch

/autoresearch

Core autonomous research loop. Reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until the target metric is achieved or the iteration budget is exhausted. TRIGGER when: user invokes "autoresearch" (no subcommand); research.md exists; user wants the 5-stage loop; user wants iterative optimization overnight.

autoresearch:debug

/debug

Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.

autoresearch:fix

/fix

Iterative error-crusher loop that auto-stops at 0 errors. Cascade-aware: fixes dependency errors before their dependents. Refuses anti-patterns that hide errors instead of fixing them. TRIGGER when: user has errors or failures to fix iteratively; user asks to "fix all errors"; user has a failing test suite; user has compilation errors; user has linter errors; user wants systematic error elimination; user invokes /autoresearch:fix. DO NOT TRIGGER when: user wants a one-shot fix for a single obvious bug; user wants debugging guidance only; user wants code review without fixing.

autoresearch:plan

/plan

7-step setup wizard that produces a complete, ready-to-run research.md without executing the research loop. Walks the user through goal, metric, search space, constraints, evaluator design, and baseline measurement, then writes the file. TRIGGER when: user wants to set up a research project; user wants to plan before running the loop; user says "plan my research"; user has a goal but no research.md; user invokes /autoresearch:plan. DO NOT TRIGGER when: research.md already exists and the user wants to run the loop; user wants a one-shot answer; user wants to debug, not optimize.

autoresearch:predict

/predict

Multi-perspective deliberation engine. Gathers independent positions from diverse personas, runs cross-examination and rebuttal rounds, detects herd behavior, and synthesizes a neutral judge verdict with confidence levels. TRIGGER when: user wants multi-perspective prediction, forecasting, scenario analysis, decision analysis, "what will happen if", "should we", "predict the outcome of", structured devil's advocacy, or any question benefiting from adversarial deliberation.

Stats

Version2.0.0

LanguageShell

Stars14

Forks2

MaintenanceExcellent

LicenseMIT

Last CommitApr 5, 2026

AddedApr 5, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

README

autoresearch-skill

Define a goal. Let the agent research, experiment, and iterate -- autonomously.

When to Use · Quick Start · Features · Usage · 한국어

autoresearch-skill in Action

	Example	Result	Iterations	Evaluator
1	Code Optimization — Sort 1M integers faster	2.12s → 0.15s (−93%)	8	`benchmark.py`
2	Function Fitting — Discover hidden math function	RMSE 2.11 → 0.030 (−99%)	8	`evaluate.py`
3	Skill Elaboration — Improve P&ID analysis skill	0.28 → 0.98 composite (+255%)	2	`evaluate.py`
4	Literature Review — Exercise timing papers	1/8 → 8/8 categories, 19 papers	4	Agent (Tier 2)

[!NOTE] An LLM skill that turns natural-language research goals into autonomous experiment-evaluate-iterate loops -- inspired by Karpathy's autoresearch. Write a research.md, and the agent handles hypothesis generation, experimentation, evaluation, and iteration. Works with Claude Code, Codex CLI, and Gemini CLI.

Features

Karpathy-Inspired Loop -- Autonomous experiment -> evaluate -> keep/revert cycle, generalized beyond ML training
Natural Language Programming -- research.md is your program: define goals, metrics, and constraints in plain English
Zero Dependencies -- Python stdlib only. No pip packages required for core functionality
Multi-Agent Compatible -- Works with Claude Code, Codex CLI, and Gemini CLI out of the box
Automatic Rollback -- Failed experiments are reverted automatically; only improvements are kept
Full Audit Trail -- Every iteration logged to research_log.md with timestamps, changes, and results
3 Tier Environment Detection -- Adapts to your runtime: full experimentation (Tier 1), research-only (Tier 2), or analysis-only (Tier 3)
Safety Built In -- Max iterations, pause-for-review intervals, forbidden-change boundaries, and time budgets

Command Inventory

Command	Purpose
`/autoresearch`	Core 5-stage loop — understand, hypothesize, experiment, evaluate, log & iterate
`/autoresearch:plan`	7-step setup wizard that produces a ready-to-run `research.md`
`/autoresearch:debug`	Scientific bug hunting with falsifiable hypotheses and evidence tables
`/autoresearch:fix`	Iterative error crusher — runs until error count reaches zero
`/autoresearch:predict`	Multi-persona deliberation with anti-herd-bias detection
`/autoresearch:security`	STRIDE + OWASP iterative security audit
`/autoresearch:scenario`	12-dimension scenario exploration for decision analysis
`/autoresearch:reason`	Adversarial refinement with blind-judge scoring panel
`/autoresearch:ship`	Universal shipping workflow supporting 9 ship types
`/autoresearch:learn`	(planned) Self-improving skill loop from feedback

Quick Decision Guide

What do you want to do?

Goal	Use
Optimize something iteratively toward a numeric target	`/autoresearch`
Set up a new research project from scratch	`/autoresearch:plan`
Hunt down a hard-to-reproduce bug	`/autoresearch:debug`
Crush all errors in a codebase to zero	`/autoresearch:fix`
Forecast outcomes or predict what will happen	`/autoresearch:predict`
Audit a system for security vulnerabilities	`/autoresearch:security`
Explore "what if" scenarios before committing to a path	`/autoresearch:scenario`
Think through a complex decision rigorously	`/autoresearch:reason`
Release a feature, library, or artifact	`/autoresearch:ship`

Why This Skill?

Other autoresearch implementations provide the loop concept. This repo provides the complete toolkit:

View full README on GitHub

Similar Plugins

claude-adaptive-research

11·

Autonomous, personalized research loops for Claude Code. Set a topic, walk away, come back to a quality-gated report adapted to your projects.

v1.0.0

primeline-ai

autoresearch

21·

Autonomous experiment loops on any codebase — one file, one metric, one loop. Based on Karpathy's autoresearch pattern.

2mo

v1.2.0

pjhoberman

researcher

232·1·

Autonomous experimentation skill — your AI coding agent designs experiments, tests hypotheses, discards failures, keeps wins. Runs overnight while you sleep.

v1.7.0

krzysztofdudek

autoresearch-agent

15.3k·5·

Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).

3mo

v2.1.2

alirezarezvani

ui-ux-pro-max

90.2k·1.3K·

UI/UX design intelligence. 67 styles, 161 palettes, 57 font pairings, 25 charts, 15 stacks (React, Next.js, Vue, Svelte, Astro, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui, Nuxt, Jetpack Compose). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.

2mo

v2.5.0

nextlevelbuilder

fullstack-dev-skills

10.0k·455·

Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.

v0.4.15

Jeffallan

autoresearch

Popularity

What's Inside

Confidence

README