From gepa-anywhere
Drive a GEPA reflective-optimization loop on any text artifact in any repo, using THIS Claude Code session as the free (Max-billed) reflection LM. The Python CLI `gepa run` handles GEPA's optimization math + checkpointing; when it needs the reflection LM it writes a pending envelope and exits 42 — you read it, propose improved artifact text, write the response, and re-invoke. Use when the user says "run gepa", "optimize this prompt/instructions", "evolve the artifact", "improve the extraction prompt", AND a `.gepa/config.yaml` exists (or they want one — then use gepa-scaffold first).
How this skill is triggered — by the user, by Claude, or both
Slash command
/gepa-anywhere:gepa-runThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
`gepa run --config .gepa/config.yaml` walks GEPA's `optimize()` loop. When GEPA
gepa run --config .gepa/config.yaml walks GEPA's optimize() loop. When GEPA
needs the reflection LM (to propose a better candidate), the CLI writes a
pending-*.json envelope under the run dir and exits with code 42. You read
the envelope, propose improved text, write a response-*.json, and re-invoke
gepa run. The CLI resumes from GEPA's run_dir checkpoint — state survives
across every exit.
Cost model: rollout + metric run as local subprocesses (command mode) — no
model cost. Reflection happens inline in THIS session — billed to the user's Max
subscription, no API key. (Subagent rollout/metric, when configured, dispatch via
the Agent tool and also bill as session work.)
0 — done; see <run_dir>/done.json and <run_dir>/holdout-report.json.42 — suspended on a pending request; see <run_dir>/pending-latest.json.1+ — error; read stderr.uv sync (one-time). The CLI is
invoked as gepa-anywhere run ... (or uv run gepa-anywhere run ...)..gepa/config.yaml exists and artifact.path points at the text
artifact. If not, run gepa-scaffold first..gepa/rollout.sh + .gepa/metric.py are implemented and
.gepa/golden/manifest.jsonl lists labeled examples with train/val/holdout
splits. The metric + golden set are what make the run worth anything.gepa-anywhere run --config .gepa/config.yaml
Then repeat:
uv run gepa-anywhere run --config .gepa/config.yaml).done.json (best_idx, total_metric_calls, cost) and
holdout-report.json (seed_mean vs best_mean on the never-optimized split).
Report the lift and whether holdout moved. Hand off to gepa-frontier to
inspect + promote the winner. STOP.<run_dir>/pending-latest.json. Dispatch on kind:
reflection → fulfill it yourself (below): propose improved component text.rollout → for each request, dispatch the Agent tool to run the system
under optimization (follow the request's prompt), writing each replica's
result to its output path. Dispatch in parallel.metric → for each request, dispatch the gepa-judge subagent to score
the outputs vs gold (follow the prompt), writing {"score", "feedback"}
to the request's response_path. Dispatch in parallel.
Each envelope carries its own instructions_for_session. After writing all
responses/outputs, GOTO 1.reflection envelopeThe payload has components — one entry per component GEPA is updating, each with
its current_text and examples (a list of {Inputs, Generated Outputs: {score}, Feedback} rows). For EACH component:
current_text and the per-example Feedback (sorted: fix the
lowest-scoring examples first). The feedback is concrete metric output — use it.prior_attempts is present, those hypotheses were already tried and
rejected — propose a different mechanism, not a reword.Write a JSON object to response_path containing ONLY the components you changed:
{ "prompt": "<the full improved text>" }
(For a multi-component artifact, include each changed component by name.) Then
re-invoke gepa run.
gepa-anywhere state --config .gepa/config.yaml prints whether the run is
suspended / done / in-flight at any time.done.json reports the run stopped on max_metric_calls,
say so. Don't hand-edit the artifact mid-run — promotion is a deliberate
gepa-frontier step.npx claudepluginhub evanfabry/gepa-anywhere --plugin gepa-anywhereCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.