Skill

evolve-score

Score one evolution candidate against the workspace evaluator. Runs the candidate's evolution_<id>.py through evaluator.py under the same sandbox the claude-evolve engine uses, parses the performance number, and writes status + performance back to evolution.csv. Use when the user says "score gen02-003", "evaluate this candidate", "run the evaluator on <id>", or when a parent skill dispatches scoring. Scoring is deterministic (no model reasoning needed) — this skill exists so the evaluator's output noise stays out of the main conversation.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-evolve:evolve-score <candidate-id> [--working-dir DIR]

User invocable

Model invocable

Inline context

Default effort

Argument hint<candidate-id> [--working-dir DIR]

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Score a single candidate. This is **deterministic** — the work is "run the user's `evaluator.py` and record the number." There is no algorithmic judgment to make. The reason it is a skill (and, in the omnibus, a Haiku subagent) is purely to keep evaluator output — which can be thousands of lines — out of the main conversation.

SKILL.md

49 lines · ~728 tokens

Stats

LanguagePython

Parent stars1

MaintenanceGood

Last CommitJun 14, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

evolve-score

Score a single candidate. This is deterministic — the work is "run the user's evaluator.py and record the number." There is no algorithmic judgment to make. The reason it is a skill (and, in the omnibus, a Haiku subagent) is purely to keep evaluator output — which can be thousands of lines — out of the main conversation.

Resolve the plugin root

This skill runs with $CLAUDE_PLUGIN_ROOT set. Capture it:

echo "PLUGIN_ROOT=$CLAUDE_PLUGIN_ROOT"

If a subagent invoked you without it, fall back to the path the caller passed you in the prompt.

Score the candidate

Run exactly one command. It syntax-checks the candidate, runs validator.py if the workspace has one, runs evaluator.py under the sandbox, parses the score, and writes the result to the CSV:

python3 "$CLAUDE_PLUGIN_ROOT/scripts/score.py" --working-dir "<WORKING_DIR>" <CANDIDATE_ID>

<WORKING_DIR> is the evolution workspace (the directory containing config.yaml). If the user didn't specify one, omit --working-dir and let it auto-detect evolution/config.yaml or ./config.yaml.
The script prints one JSON line on stdout (everything else is stderr noise you can ignore):
- success: {"id":"...","ok":true,"score":1.23,"status":"complete","extra":{...}}
- failure: {"id":"...","ok":false,"status":"failed|failed-validation","error":"...","stage":"syntax|validator|evaluator"}

The script has already written the status and performance to evolution.csv either way — you do not touch the CSV yourself.

Report

Return a single line to whoever called you:

success → <id>: score=<n> (complete)
failure → <id>: FAILED at <stage> — <short error> (status=<status>)

Do not dump the full evaluator output into your reply. If the user explicitly asks why something failed, include the error field; otherwise keep it to the one-liner. That brevity is the whole point of running scoring in an isolated context.

Do not

Do not edit the candidate's code to make it pass — that is evolve-code's job. A failing score is a real result, not a problem to fix here.
Do not retry with a different evaluator or fabricate a score. If the evaluator fails, report the failure honestly; the engine has recorded failed.

evolve-score

Popularity

Invocation

Context Preview

SKILL.md

evolve-score

Popularity

Invocation

Context Preview

SKILL.md

evolve-score

Resolve the plugin root

Score the candidate

Report

Do not

Similar Skills

evolve-score

Resolve the plugin root

Score the candidate

Report

Do not

Similar Skills