Skill

gepa-run

Drive a GEPA reflective-optimization loop on any text artifact in any repo, using THIS Claude Code session as the free (Max-billed) reflection LM. The Python CLI `gepa run` handles GEPA's optimization math + checkpointing; when it needs the reflection LM it writes a pending envelope and exits 42 — you read it, propose improved artifact text, write the response, and re-invoke. Use when the user says "run gepa", "optimize this prompt/instructions", "evolve the artifact", "improve the extraction prompt", AND a `.gepa/config.yaml` exists (or they want one — then use gepa-scaffold first).

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/gepa-anywhere:gepa-run

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

`gepa run --config .gepa/config.yaml` walks GEPA's `optimize()` loop. When GEPA

SKILL.md

90 lines · ~1.2k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

gepa-run

gepa run --config .gepa/config.yaml walks GEPA's optimize() loop. When GEPA needs the reflection LM (to propose a better candidate), the CLI writes a pending-*.json envelope under the run dir and exits with code 42. You read the envelope, propose improved text, write a response-*.json, and re-invoke gepa run. The CLI resumes from GEPA's run_dir checkpoint — state survives across every exit.

Cost model: rollout + metric run as local subprocesses (command mode) — no model cost. Reflection happens inline in THIS session — billed to the user's Max subscription, no API key. (Subagent rollout/metric, when configured, dispatch via the Agent tool and also bill as session work.)

Exit codes

0 — done; see <run_dir>/done.json and <run_dir>/holdout-report.json.
42 — suspended on a pending request; see <run_dir>/pending-latest.json.
1+ — error; read stderr.

Prerequisites (verify once per fresh run)

Python env: in the gepa-anywhere repo, uv sync (one-time). The CLI is invoked as gepa-anywhere run ... (or uv run gepa-anywhere run ...).
Config: .gepa/config.yaml exists and artifact.path points at the text artifact. If not, run gepa-scaffold first.
Hooks + golden: .gepa/rollout.sh + .gepa/metric.py are implemented and .gepa/golden/manifest.jsonl lists labeled examples with train/val/holdout splits. The metric + golden set are what make the run worth anything.

The loop (do this until exit 0)

gepa-anywhere run --config .gepa/config.yaml

Then repeat:

Run the command above (or uv run gepa-anywhere run --config .gepa/config.yaml).
Exit 0 → done. Read done.json (best_idx, total_metric_calls, cost) and holdout-report.json (seed_mean vs best_mean on the never-optimized split). Report the lift and whether holdout moved. Hand off to gepa-frontier to inspect + promote the winner. STOP.
Exit 42 → read <run_dir>/pending-latest.json. Dispatch on kind:
- reflection → fulfill it yourself (below): propose improved component text.
- rollout → for each request, dispatch the Agent tool to run the system under optimization (follow the request's prompt), writing each replica's result to its output path. Dispatch in parallel.
- metric → for each request, dispatch the gepa-judge subagent to score the outputs vs gold (follow the prompt), writing {"score", "feedback"} to the request's response_path. Dispatch in parallel. Each envelope carries its own instructions_for_session. After writing all responses/outputs, GOTO 1.
Exit 1+ → stop and surface the stderr to the user.

Fulfilling a `reflection` envelope

The payload has components — one entry per component GEPA is updating, each with its current_text and examples (a list of {Inputs, Generated Outputs: {score}, Feedback} rows). For EACH component:

Read current_text and the per-example Feedback (sorted: fix the lowest-scoring examples first). The feedback is concrete metric output — use it.
Propose an improved full text for that component that addresses the feedback. Make a minimal, coherent edit — do not pad or bloat. Prefer tightening/removing over accreting.
If prior_attempts is present, those hypotheses were already tried and rejected — propose a different mechanism, not a reword.

Write a JSON object to response_path containing ONLY the components you changed:

{ "prompt": "<the full improved text>" }

(For a multi-component artifact, include each changed component by name.) Then re-invoke gepa run.

Notes

Re-invoking after a crash is safe: requests are keyed by content, responses are cached, and rollouts are cached per (candidate, example) — the loop is fully resumable.
gepa-anywhere state --config .gepa/config.yaml prints whether the run is suspended / done / in-flight at any time.
Honor the budget: when done.json reports the run stopped on max_metric_calls, say so. Don't hand-edit the artifact mid-run — promotion is a deliberate gepa-frontier step.

gepa-run

Invocation

Context Preview

SKILL.md

gepa-run

Invocation

Context Preview

SKILL.md

gepa-run

Exit codes

Prerequisites (verify once per fresh run)

The loop (do this until exit 0)

Fulfilling a `reflection` envelope

Notes

Similar Skills

gepa-run

Exit codes

Prerequisites (verify once per fresh run)

The loop (do this until exit 0)

Fulfilling a `reflection` envelope

Notes

Similar Skills

gepa-run

Invocation

Context Preview

SKILL.md

gepa-run

Invocation

Context Preview

SKILL.md

gepa-run

Exit codes

Prerequisites (verify once per fresh run)

The loop (do this until exit 0)

Fulfilling a reflection envelope

Notes

Similar Skills

gepa-run

Exit codes

Prerequisites (verify once per fresh run)

The loop (do this until exit 0)

Fulfilling a reflection envelope

Notes

Similar Skills

Fulfilling a `reflection` envelope

Fulfilling a `reflection` envelope