From skill-eval
Run an autoresearch-style optimization loop on a target identified by M3. Generates improvements via LLM, evaluates them against the target's eval questions, keeps improvements that beat the best score, and tracks all experiments in a JSONL state file. Inspired by Karpathy's autoresearch pattern.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skill-eval:autoresearch-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Iterative optimization of skills, prompts, and code via the autoresearch pattern.
Iterative optimization of skills, prompts, and code via the autoresearch pattern.
# Run optimization loop on the top-ranked target
bun scripts/autoresearch-loop.ts -t targets.jsonl --max-rounds 20
# Optimize a specific target by rank
bun scripts/autoresearch-loop.ts -t targets.jsonl --target-rank 3 --max-rounds 10
# Dry run: show what would be optimized
bun scripts/autoresearch-loop.ts -t targets.jsonl --dry-run
# Use a custom state directory
bun scripts/autoresearch-loop.ts -t targets.jsonl --state-dir ./experiments-v2
# View summary of previous experiments
bun scripts/autoresearch-loop.ts --summary --state-dir ./experiments
| Flag | Description |
|---|---|
--targets, -t <file> | Input targets.jsonl from M3 (required unless --summary) |
--target-rank <n> | Which target to optimize by rank (default: 1 = highest) |
--max-rounds <n> | Max improvement rounds (default: 10) |
--state-dir <dir> | Directory for experiment state files (default: ./experiments) |
--summary | Print summary of existing experiments |
--dry-run | Show what would be optimized without running |
--help | Show help |
@anthropic-ai/sdk (npm)ANTHROPIC_API_KEY environment variableCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub asragab/asragab-claude-marketplace --plugin skill-eval