Marketplace

abdielou-autoeval

npx claudepluginhub abdielou/autoeval

README

View full README on GitHub

1 Plugin

autoeval

0·

Transform vague optimization problems into fully scaffolded autonomous experiment loops with eval suites, scoring functions, and meta-agent directives.

2mo

v0.9.0

abdielou

Stats

Plugins1

UpdatedApr 18, 2026

Links

View on GitHub View Marketplace JSON

autoeval — Autonomous Optimization Loop Scaffolder

A Claude Code skill that transforms a vague optimization problem into a fully scaffolded, runnable autonomous experiment loop — bridging the gap between "I have an idea" and "I have an autonomous experiment running overnight."

What it does

The /autoeval command guides you through defining your optimization problem, designing an automatic scoring function, and building a complete eval suite — then scaffolds everything the meta-agent needs to run overnight: a program.md directive, seed implementation, and kickoff command.

Key features

Problem classification — maps your problem against a 12-type loop taxonomy (Training, Agent Harness, Generative Output, Algorithm Performance, and 8 more)
Metric design — interactive exploration of scoring functions with stress-testing for gameability, signal strength, and cost
Eval suite building — the crown jewel. Coverage strategy, test cases, scoring functions, sanity checks, and gap documentation
Edit surface marking — clearly separates what the meta-agent can modify from fixed infrastructure
program.md generation — complete meta-agent directive with iteration protocol, constraints, and domain guidance
Exit ramp — identifies problems that aren't suited for optimization loops and says so, rather than building a bad loop
Socrates integration — dialectic stress-testing at key decision points (optional, via zetaminusone/socrates)

Inspired by

autoresearch — autonomous ML research (Andrej Karpathy, 2025)
autoagent — autonomous agent harness engineering (Kevin Gu, 2025)
AIDE — AI-driven ML experiment iteration (Weco AI)
OpenEvolve — open-source AlphaEvolve, evolutionary code optimization
AI-Scientist — fully automated open-ended scientific discovery (Sakana AI)

All share the same core loop: change → run → score → keep/discard → repeat. autoeval helps you set up that loop for any problem.

Installation

Plugin Marketplace (recommended)

Interactive:

/plugin

Go to the Marketplaces tab, add abdielou/autoeval, then switch to the Discover tab and install.

CLI:

/plugin marketplace add abdielou/autoeval
/plugin install autoeval@abdielou-autoeval

Manual (git clone)

User-level (available in all projects):

git clone https://github.com/abdielou/autoeval.git ~/.claude/skills/autoeval

Project-level (available in one project):

git clone https://github.com/abdielou/autoeval.git .claude/skills/autoeval

Local development

claude --plugin-dir ./path/to/autoeval

Usage

/autoeval <describe your optimization problem>
/autoeval --auto <describe your optimization problem>

Examples:

/autoeval generate realistic engine sounds driven by RPM and throttle inputs
/autoeval build an agent that recommends energy procurement strategies
/autoeval --auto improve my RAG pipeline's retrieval accuracy

The --auto flag makes Phases 3-6 run autonomously after the interactive metric design is locked in.

How it works

autoeval runs through 6 phases:

Problem Definition (interactive) — clarify what you're building, classify the loop type, exit if it's not an optimization problem
Metric Design (interactive) — find an automatic scoring function, stress-test it for gameability and signal strength
Eval Suite (interactive) — build test cases, scoring functions, and coverage strategy
Harness Scaffolding — generate a minimal seed implementation with marked edit surface
Program.md Generation — write the meta-agent directive with iteration protocol and constraints
Loop Setup — wire everything together, verify baseline, provide kickoff command

Phases 1-2 are always interactive. Phases 3-6 are interactive by default, or autonomous with --auto.

What it produces

Artifact	Description
`program.md`	Meta-agent directive — goal, edit surface, eval commands, keep/discard rules
Seed harness	Minimal baseline implementation with clearly marked edit surface
`evals/`	Test cases, scoring functions, runner script, coverage documentation
Environment files	Dependencies, Dockerfile (optional), .gitignore

Kicking off the experiment

After autoeval completes:

claude --dangerously-skip-permissions --append-system-prompt-file program.md "Start the optimization loop."

The meta-agent reads program.md and begins the autonomous loop — modifying the seed implementation, scoring each change, keeping improvements, and repeating indefinitely.

abdielou-autoeval

README

1 Plugin

autoeval

abdielou-autoeval

README

autoeval — Autonomous Optimization Loop Scaffolder

What it does

Key features

Inspired by

Installation

Plugin Marketplace (recommended)

Manual (git clone)

Local development

Usage

How it works

What it produces

Kicking off the experiment

Loop type taxonomy

1 Plugin

autoeval

Related Marketplaces

nextjs

thedotmack

ruview

autoeval — Autonomous Optimization Loop Scaffolder

What it does

Key features

Inspired by

Installation

Plugin Marketplace (recommended)

Manual (git clone)

Local development

Usage

How it works

What it produces

Kicking off the experiment

Loop type taxonomy

Related Marketplaces

nextjs

thedotmack

ruview