From gepa-anywhere
Lay out a repository to hill-climb (optimize) ARBITRARILY MANY agents with gepa, each with its own grounded evaluator — by hand-creating the `.gepa/agents/<name>/` directories + a registry convention. Use once per repo, before building evaluators or running optimizations, when the user says "I want to optimize several agents/prompts in this repo", "set up a multi-agent gepa project", "initialize gepa for many agents", or "add gepa to this project" AND more than one agent will be optimized. NOT for building the evaluator itself (that's evaluator-discovery) or running the loop (that's gepa-run). For a single prompt + a code metric, prefer `gepa scaffold` (the flat `.gepa/config.yaml` quickstart) — this skill is the multi-agent superset and produces a DIFFERENT, incompatible layout, so don't mix them in one repo.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gepa-anywhere:gepa-initThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
One repo, many optimizable agents. This stands up the shared structure so each agent gets its
One repo, many optimizable agents. This stands up the shared structure so each agent gets its
own artifact, external evaluator, input surfaces, anchor, and config — plus a registry to
track them. Entry point: run it once, then evaluator-discovery per agent, then gepa run per
agent.
These are hand-authored files. There is no gepa init CLI subcommand — you create the
directories and write the YAML/markdown directly. The registry is a convention these skills
maintain; gepa itself only ever consumes a --config <path>. Don't wait for a tool to flip a
status — the skills do it by editing the file. (This deliberately differs from gepa scaffold,
which writes a single flat .gepa/config.yaml for the one-prompt case.)
.gepa/
registry.yaml # convention: every agent + its evaluator + status/trust
agents/
<name>/
config.yaml # gepa config for THIS agent (see the path rule below)
manifest.jsonl # the agent's dataset: rows {input, gold?, split} to optimize on
target.md # what the output is + success criteria (evaluator-discovery §1)
surfaces/ # the inputs the agent saw + parity_test.sh (evaluator-discovery §2)
evaluator.md # the EXTERNAL judge rubric (evaluator-discovery §3–5)
anchors.jsonl # correctness anchor: {id, clean, corrupted, corruption} pairs (NOT a dataset)
calibration/ # calibration runs + report.md + the evidence-gated trust level
runs/ # gitignored: this agent's run state (see path rule)
anchors.jsonl is the evaluator's correctness anchor (clean≻corrupted pairs), distinct from
manifest.jsonl (the agent's optimization dataset). Idempotent: never clobber an existing agent
dir/config; only fill gaps.
The loop computes repo_dir = config_path.parent.parent. For a per-agent config at
.gepa/agents/<name>/config.yaml, repo_dir is therefore .gepa/agents/, and every
artifact.path / dataset.manifest resolves from there. So write them accordingly:
artifact.path: ../../prompts/<name>.mddataset.manifest: <name>/manifest.jsonl.gepa/agents/<name>/runs/.
Gitignore .gepa/agents/*/runs/ (and .gepa/runs/). (Roadmap: a core --repo-dir flag would
remove the ../../ and is the clean fix; until then, the relative paths above work.)config.yaml per agentRunConfig.load requires all of artifact/dataset/rollout/metric/budget/reflection — an
incomplete stub fails gepa state. Write a valid one:
run_id: <name>-001
artifact: { mode: single_file, path: ../../prompts/<name>.md }
dataset: { manifest: <name>/manifest.jsonl, splits: { train: 0.6, val: 0.2, holdout: 0.2 } }
rollout:
mode: subagent # the agent IS an llm following the prompt; use command if it's a script
replicas: 3
subagent: { prompt: "Follow {artifact} on {input}; write the result to {output}." }
metric:
mode: subagent # the calibrated judge — names evaluator.md + surfaces literally (no token exists)
subagent:
prompt: |
Read the rubric at .gepa/agents/<name>/evaluator.md and the surfaces in
.gepa/agents/<name>/surfaces/. Score the outputs in {outputs} per the rubric, grounded in
those surfaces. Write {"score": <0..1>, "feedback": "<specific, surface-grounded>"}.
budget: { max_metric_calls: 30 }
reflection: { mode: session }
mkdir -p .gepa/agents; create registry.yaml with agents: [] if absent;
append .gepa/agents/*/runs/ to .gitignore..gepa/agents/<name>/ with the layout above and the
minimal valid config.yaml (correct relative paths). Add a registry row with
status: needs-evaluator.<name> (it
builds + calibrates evaluator.md and hand-updates the registry to status: ready,
trust: <level>), after which gepa-run optimizes it.agents:
- name: extract
config: .gepa/agents/extract/config.yaml
evaluator: .gepa/agents/extract/evaluator.md
status: needs-evaluator | calibrating | ready # maintained by the skills, not a tool
trust: unknown | low | medium | high # = anchor coverage/independence, NOT self-consistency
Real repos have several agents (extractor, summarizer, router) that each need a different
grounded evaluator and notion of "good". Per-agent dirs keep their artifact/evaluator/surfaces/
anchor isolated and independently hill-climbable; the registry is the one place to see what's
optimizable and how much each evaluator is trusted. Adding the Nth agent is one more
agents/<name>/ + one registry row. Per agent: evaluator-discovery → gepa-run → gepa-frontier.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub evanfabry/gepa-anywhere --plugin gepa-anywhere