Skill

gepa-init

Lay out a repository to hill-climb (optimize) ARBITRARILY MANY agents with gepa, each with its own grounded evaluator — by hand-creating the `.gepa/agents/<name>/` directories + a registry convention. Use once per repo, before building evaluators or running optimizations, when the user says "I want to optimize several agents/prompts in this repo", "set up a multi-agent gepa project", "initialize gepa for many agents", or "add gepa to this project" AND more than one agent will be optimized. NOT for building the evaluator itself (that's evaluator-discovery) or running the loop (that's gepa-run). For a single prompt + a code metric, prefer `gepa scaffold` (the flat `.gepa/config.yaml` quickstart) — this skill is the multi-agent superset and produces a DIFFERENT, incompatible layout, so don't mix them in one repo.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/gepa-anywhere:gepa-init

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

One repo, many optimizable agents. This stands up the shared structure so each agent gets its

SKILL.md

102 lines · ~1.5k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 9, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

gepa-init

One repo, many optimizable agents. This stands up the shared structure so each agent gets its own artifact, external evaluator, input surfaces, anchor, and config — plus a registry to track them. Entry point: run it once, then evaluator-discovery per agent, then gepa run per agent.

These are hand-authored files. There is no gepa init CLI subcommand — you create the directories and write the YAML/markdown directly. The registry is a convention these skills maintain; gepa itself only ever consumes a --config <path>. Don't wait for a tool to flip a status — the skills do it by editing the file. (This deliberately differs from gepa scaffold, which writes a single flat .gepa/config.yaml for the one-prompt case.)

Layout

.gepa/
  registry.yaml                 # convention: every agent + its evaluator + status/trust
  agents/
    <name>/
      config.yaml               # gepa config for THIS agent (see the path rule below)
      manifest.jsonl            # the agent's dataset: rows {input, gold?, split} to optimize on
      target.md                 # what the output is + success criteria (evaluator-discovery §1)
      surfaces/                 # the inputs the agent saw + parity_test.sh (evaluator-discovery §2)
      evaluator.md              # the EXTERNAL judge rubric (evaluator-discovery §3–5)
      anchors.jsonl             # correctness anchor: {id, clean, corrupted, corruption} pairs (NOT a dataset)
      calibration/              # calibration runs + report.md + the evidence-gated trust level
      runs/                     # gitignored: this agent's run state (see path rule)

anchors.jsonl is the evaluator's correctness anchor (clean≻corrupted pairs), distinct from manifest.jsonl (the agent's optimization dataset). Idempotent: never clobber an existing agent dir/config; only fill gaps.

Critical: path resolution

The loop computes repo_dir = config_path.parent.parent. For a per-agent config at .gepa/agents/<name>/config.yaml, repo_dir is therefore .gepa/agents/, and every artifact.path / dataset.manifest resolves from there. So write them accordingly:

the agent's prompt living at repo root → artifact.path: ../../prompts/<name>.md
the agent's dataset under its own dir → dataset.manifest: <name>/manifest.jsonl
runs land in .gepa/agents/<name>/runs/. Gitignore .gepa/agents/*/runs/ (and .gepa/runs/). (Roadmap: a core --repo-dir flag would remove the ../../ and is the clean fix; until then, the relative paths above work.)

Minimal valid `config.yaml` per agent

RunConfig.load requires all of artifact/dataset/rollout/metric/budget/reflection — an incomplete stub fails gepa state. Write a valid one:

run_id: <name>-001
artifact: { mode: single_file, path: ../../prompts/<name>.md }
dataset: { manifest: <name>/manifest.jsonl, splits: { train: 0.6, val: 0.2, holdout: 0.2 } }
rollout:
  mode: subagent            # the agent IS an llm following the prompt; use command if it's a script
  replicas: 3
  subagent: { prompt: "Follow {artifact} on {input}; write the result to {output}." }
metric:
  mode: subagent            # the calibrated judge — names evaluator.md + surfaces literally (no token exists)
  subagent:
    prompt: |
      Read the rubric at .gepa/agents/<name>/evaluator.md and the surfaces in
      .gepa/agents/<name>/surfaces/. Score the outputs in {outputs} per the rubric, grounded in
      those surfaces. Write {"score": <0..1>, "feedback": "<specific, surface-grounded>"}.
budget: { max_metric_calls: 30 }
reflection: { mode: session }

Steps

Skeleton. mkdir -p .gepa/agents; create registry.yaml with agents: [] if absent; append .gepa/agents/*/runs/ to .gitignore.
Per agent the user names. Create .gepa/agents/<name>/ with the layout above and the minimal valid config.yaml (correct relative paths). Add a registry row with status: needs-evaluator.
Hand off. For each agent, tell the user to run evaluator-discovery for <name> (it builds + calibrates evaluator.md and hand-updates the registry to status: ready, trust: <level>), after which gepa-run optimizes it.

registry.yaml shape

agents:
  - name: extract
    config: .gepa/agents/extract/config.yaml
    evaluator: .gepa/agents/extract/evaluator.md
    status: needs-evaluator | calibrating | ready        # maintained by the skills, not a tool
    trust: unknown | low | medium | high                 # = anchor coverage/independence, NOT self-consistency

Why a registry + per-agent dirs

Real repos have several agents (extractor, summarizer, router) that each need a different grounded evaluator and notion of "good". Per-agent dirs keep their artifact/evaluator/surfaces/ anchor isolated and independently hill-climbable; the registry is the one place to see what's optimizable and how much each evaluator is trusted. Adding the Nth agent is one more agents/<name>/ + one registry row. Per agent: evaluator-discovery → gepa-run → gepa-frontier.

gepa-init

Invocation

Context Preview

SKILL.md

gepa-init

Invocation

Context Preview

SKILL.md

gepa-init

Layout

Critical: path resolution

Minimal valid `config.yaml` per agent

Steps

registry.yaml shape

Why a registry + per-agent dirs

Similar Skills

gepa-init

Layout

Critical: path resolution

Minimal valid `config.yaml` per agent

Steps

registry.yaml shape

Why a registry + per-agent dirs

Similar Skills

gepa-init

Invocation

Context Preview

SKILL.md

gepa-init

Invocation

Context Preview

SKILL.md

gepa-init

Layout

Critical: path resolution

Minimal valid config.yaml per agent

Steps

registry.yaml shape

Why a registry + per-agent dirs

Similar Skills

gepa-init

Layout

Critical: path resolution

Minimal valid config.yaml per agent

Steps

registry.yaml shape

Why a registry + per-agent dirs

Similar Skills

Minimal valid `config.yaml` per agent

Minimal valid `config.yaml` per agent