By tbelbek
Closed-loop empirical experiment runner for Claude Code. Edits one file, runs it, keeps changes that improve a scalar metric. Faithful port of Andrej Karpathy's autoresearch pattern (github.com/karpathy/autoresearch).
A Claude Code plugin that runs Andrej Karpathy's autoresearch pattern on any repo — autonomously edit one file, run it, keep changes that improve a scalar metric, reset the ones that don't. Git is the memory. A TSV is the audit trail.
Run the loop on your repo to make it better. Define a metric. Walk away. Review diffs when you come back.
You give it a program.md spec — editable file, frozen files, one shell command, one scalar metric, a budget. The skill then hill-climbs:
for N trials:
pick next hypothesis from program.md
edit the editable file
commit WIP
run the command (hard wall-clock kill)
parse metric + guards from run.log
if metric improved AND guards pass -> keep commit
else -> git reset --hard
append row to .autoresearch/results.tsv
One git branch per run (autoresearch/<tag>). One TSV row per trial. No LLM self-grading — the scalar metric decides.
Generic "AI coding agents" plan, implement, and declare victory. This runs code, reads a number, and either keeps the change or throws it out. Closer to stochastic optimization than to code-gen chat.
| Generic agent | karpathy-autoresearch | |
|---|---|---|
| Judge | LLM self-assessment | Ground-truth scalar metric |
| Memory | Chat context | Git branch + results.tsv |
| Action surface | Whole codebase | Exactly one file |
| Stop condition | "I think we're done" | Budget / no-improvement streak |
| Bias | Add features | Prefer code deletion |
claude plugin marketplace add tbelbek/karpathy-autoresearch
claude plugin install karpathy-autoresearch@karpathy-autoresearch-marketplace
cd into any repo you want to improve./karpathy-autoresearch. If no program.md exists, the skill offers to scaffold one from the template and stops so you can fill it in./karpathy-autoresearch again. It echoes the program back and asks for authorization..autoresearch/results.tsv and a git branch of experiments.program.md sectionstrain.py — only file the loop may modifyevaluate.py, pyproject.tomluv run train.py > run.log 2>&1val_bpb, direction min, parsed via regex on run.logFull template: skills/karpathy-autoresearch/references/program.template.md.
program.md — no silent defaultsgit reset --hard on every rejected trial — no creeping statePattern by Andrej Karpathy — https://github.com/karpathy/autoresearch. This plugin is an independent port to Claude Code; no code from Karpathy's repo is included.
MIT
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub tbelbek/karpathy-autoresearch --plugin karpathy-autoresearchComprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.
Multi-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.
Write feature specs, plan roadmaps, and synthesize user research faster. Keep stakeholders updated and stay ahead of the competitive landscape.
Memory compression system for Claude Code - persist context across sessions
Editorial "Web Designer" bundle for Claude Code from Antigravity Awesome Skills.