npx claudepluginhub abdielou/autoevalTransform vague optimization problems into fully scaffolded autonomous experiment loops with eval suites, scoring functions, and meta-agent directives.
A Claude Code skill that transforms a vague optimization problem into a fully scaffolded, runnable autonomous experiment loop — bridging the gap between "I have an idea" and "I have an autonomous experiment running overnight."
The /autoeval command guides you through defining your optimization problem, designing an automatic scoring function, and building a complete eval suite — then scaffolds everything the meta-agent needs to run overnight: a program.md directive, seed implementation, and kickoff command.
All share the same core loop: change → run → score → keep/discard → repeat. autoeval helps you set up that loop for any problem.
Interactive:
/plugin
Go to the Marketplaces tab, add abdielou/autoeval, then switch to the Discover tab and install.
CLI:
/plugin marketplace add abdielou/autoeval
/plugin install autoeval@abdielou-autoeval
User-level (available in all projects):
git clone https://github.com/abdielou/autoeval.git ~/.claude/skills/autoeval
Project-level (available in one project):
git clone https://github.com/abdielou/autoeval.git .claude/skills/autoeval
claude --plugin-dir ./path/to/autoeval
/autoeval <describe your optimization problem>
/autoeval --auto <describe your optimization problem>
Examples:
/autoeval generate realistic engine sounds driven by RPM and throttle inputs
/autoeval build an agent that recommends energy procurement strategies
/autoeval --auto improve my RAG pipeline's retrieval accuracy
The --auto flag makes Phases 3-6 run autonomously after the interactive metric design is locked in.
autoeval runs through 6 phases:
Phases 1-2 are always interactive. Phases 3-6 are interactive by default, or autonomous with --auto.
| Artifact | Description |
|---|---|
program.md | Meta-agent directive — goal, edit surface, eval commands, keep/discard rules |
| Seed harness | Minimal baseline implementation with clearly marked edit surface |
evals/ | Test cases, scoring functions, runner script, coverage documentation |
| Environment files | Dependencies, Dockerfile (optional), .gitignore |
After autoeval completes:
claude --dangerously-skip-permissions --append-system-prompt-file program.md "Start the optimization loop."
The meta-agent reads program.md and begins the autonomous loop — modifying the seed implementation, scoring each change, keeping improvements, and repeating indefinitely.