From optimize-anything
Create or write an evaluator script for scoring text artifacts, prompts, or configs during gepa optimization. Use when asked to build, scaffold, or generate an evaluator, scoring function, or judge for optimize-anything.
How this skill is triggered — by the user, by Claude, or both
Slash command
/optimize-anything:generate-evaluatorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate an evaluator that scores candidate artifacts for optimization with gepa. Include diagnostic feedback so reflections can improve weak dimensions.
Generate an evaluator that scores candidate artifacts for optimization with gepa. Include diagnostic feedback so reflections can improve weak dimensions.
--evaluator-command) or HTTP POST body (--evaluator-url){"candidate": "<text>"}--dataset): {"candidate": "<text>", "example": {...}}score (float, usually in [0,1]), plus optional side-info fields.generate-evaluator now defaults to --evaluator-type judge.
response_format={"type": "json_object"} and includes dimension scores.--evaluator-type command
--evaluator-type http
--evaluator-type composite
score: 0.0.--evaluator-type judge|command|http|composite--model <litellm-model>: hardcodes judge model into judge/composite scripts.--dataset: generate dataset-aware templates that read example and show how to use it in scoring.--intake-json / --intake-file: embed rubric/quality dimensions.Generate a judge evaluator and test it:
# Generate
optimize-anything generate-evaluator seed.txt \
--objective "Score clarity and specificity" \
--model openai/gpt-4o-mini > eval_judge.py
# Test it
echo '{"candidate":"Your artifact text here"}' | python3 eval_judge.py
This returns JSON like:
{"score": 0.82, "reasoning": "Clear structure but lacks examples", "clarity": 0.9, "specificity": 0.7}
For dataset-aware evaluators:
optimize-anything generate-evaluator seed.txt \
--objective "Score correctness" \
--dataset examples.jsonl > eval_dataset.py
echo '{"candidate":"text","example":{"input":"q","expected":"a"}}' | python3 eval_dataset.py
score plus diagnostic fields.npx claudepluginhub asragab/optimize-anythingScores candidate artifacts against user criteria on 1-10 scale and generates ASI (highest-leverage direction) for next iteration in simmer workflow. Supports judge-only, runnable evaluator, hybrid modes.
Creates evaluator functions in evaluators.ts for Output SDK workflows to implement quality assessment, validation logic, and LLM-powered content evaluation with confidence scores.
Optimizes text artifacts — code, prompts, agent architectures, configs — via GEPA's evolutionary search API with evaluator-driven ASI feedback.