By wjgoarxiv
Autonomous research loops with 10 commands. Generalizes Karpathy's autoresearch loop to any domain with mechanical evaluation, overnight persistence, and zero dependencies.
Core autonomous research loop. Reads research.md, proposes hypotheses, runs experiments, evaluates results mechanically, keeps improvements, discards failures, and iterates until the target metric is achieved or the iteration budget is exhausted. TRIGGER when: user invokes "autoresearch" (no subcommand); research.md exists; user wants the 5-stage loop; user wants iterative optimization overnight.
Scientific bug hunting using falsifiable hypotheses. Forms hypotheses, designs falsifying tests, eliminates candidates systematically, and logs the full investigation trail in a structured debug/ folder. TRIGGER when: user has a bug to investigate scientifically; user wants systematic root-cause analysis; user says "debug", "investigate", "root cause", "why is this failing"; user invokes /autoresearch:debug. DO NOT TRIGGER when: user wants to optimize a metric (use /autoresearch); user wants to fix a known error automatically (use /autoresearch:fix); user just wants a quick one-line answer about what a function does.
Iterative error-crusher loop that auto-stops at 0 errors. Cascade-aware: fixes dependency errors before their dependents. Refuses anti-patterns that hide errors instead of fixing them. TRIGGER when: user has errors or failures to fix iteratively; user asks to "fix all errors"; user has a failing test suite; user has compilation errors; user has linter errors; user wants systematic error elimination; user invokes /autoresearch:fix. DO NOT TRIGGER when: user wants a one-shot fix for a single obvious bug; user wants debugging guidance only; user wants code review without fixing.
7-step setup wizard that produces a complete, ready-to-run research.md without executing the research loop. Walks the user through goal, metric, search space, constraints, evaluator design, and baseline measurement, then writes the file. TRIGGER when: user wants to set up a research project; user wants to plan before running the loop; user says "plan my research"; user has a goal but no research.md; user invokes /autoresearch:plan. DO NOT TRIGGER when: research.md already exists and the user wants to run the loop; user wants a one-shot answer; user wants to debug, not optimize.
Multi-perspective deliberation engine. Gathers independent positions from diverse personas, runs cross-examination and rebuttal rounds, detects herd behavior, and synthesizes a neutral judge verdict with confidence levels. TRIGGER when: user wants multi-perspective prediction, forecasting, scenario analysis, decision analysis, "what will happen if", "should we", "predict the outcome of", structured devil's advocacy, or any question benefiting from adversarial deliberation.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.

Define a goal. Let the agent research, experiment, and iterate -- autonomously.
When to Use · Quick Start · Features · Usage · 한국어
| Example | Result | Iterations | Evaluator | |
|---|---|---|---|---|
| 1 | Code Optimization — Sort 1M integers faster | 2.12s → 0.15s (−93%) | 8 | benchmark.py |
| 2 | Function Fitting — Discover hidden math function | RMSE 2.11 → 0.030 (−99%) | 8 | evaluate.py |
| 3 | Skill Elaboration — Improve P&ID analysis skill | 0.28 → 0.98 composite (+255%) | 2 | evaluate.py |
| 4 | Literature Review — Exercise timing papers | 1/8 → 8/8 categories, 19 papers | 4 | Agent (Tier 2) |
[!NOTE] An LLM skill that turns natural-language research goals into autonomous experiment-evaluate-iterate loops -- inspired by Karpathy's autoresearch. Write a
research.md, and the agent handles hypothesis generation, experimentation, evaluation, and iteration. Works with Claude Code, Codex CLI, and Gemini CLI.
research.md is your program: define goals, metrics, and constraints in plain Englishresearch_log.md with timestamps, changes, and results| Command | Purpose |
|---|---|
/autoresearch | Core 5-stage loop — understand, hypothesize, experiment, evaluate, log & iterate |
/autoresearch:plan | 7-step setup wizard that produces a ready-to-run research.md |
/autoresearch:debug | Scientific bug hunting with falsifiable hypotheses and evidence tables |
/autoresearch:fix | Iterative error crusher — runs until error count reaches zero |
/autoresearch:predict | Multi-persona deliberation with anti-herd-bias detection |
/autoresearch:security | STRIDE + OWASP iterative security audit |
/autoresearch:scenario | 12-dimension scenario exploration for decision analysis |
/autoresearch:reason | Adversarial refinement with blind-judge scoring panel |
/autoresearch:ship | Universal shipping workflow supporting 9 ship types |
/autoresearch:learn | (planned) Self-improving skill loop from feedback |
What do you want to do?
| Goal | Use |
|---|---|
| Optimize something iteratively toward a numeric target | /autoresearch |
| Set up a new research project from scratch | /autoresearch:plan |
| Hunt down a hard-to-reproduce bug | /autoresearch:debug |
| Crush all errors in a codebase to zero | /autoresearch:fix |
| Forecast outcomes or predict what will happen | /autoresearch:predict |
| Audit a system for security vulnerabilities | /autoresearch:security |
| Explore "what if" scenarios before committing to a path | /autoresearch:scenario |
| Think through a complex decision rigorously | /autoresearch:reason |
| Release a feature, library, or artifact | /autoresearch:ship |
Other autoresearch implementations provide the loop concept. This repo provides the complete toolkit:
npx claudepluginhub wjgoarxiv/autoresearch-skillMulti-agent research conference with 7 commands
Autonomous, personalized research loops for Claude Code. Set a topic, walk away, come back to a quality-gated report adapted to your projects.
Autonomous experiment loops on any codebase — one file, one metric, one loop. Based on Karpathy's autoresearch pattern.
Autonomous experimentation skill — your AI coding agent designs experiments, tests hypotheses, discards failures, keeps wins. Runs overnight while you sleep.
Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
UI/UX design intelligence. 67 styles, 161 palettes, 57 font pairings, 25 charts, 15 stacks (React, Next.js, Vue, Svelte, Astro, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui, Nuxt, Jetpack Compose). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.