By EvanFabry
Scaffold-into-any-repo GEPA prompt/artifact optimization. Five skills: gepa-init (lay out a multi-agent repo), evaluator-discovery (build + calibrate a grounded evaluator for an agent), gepa-scaffold (single-prompt quickstart), gepa-run (drive the exit-42 session-reflection loop), gepa-frontier (inspect Pareto + promote the winner).
Build a trustworthy, anchored evaluator for an agent whose output you want to optimize, when no reliable metric/judge/gold exists yet. It learns exactly what the agent produces, gives the evaluator the SAME inputs the agent saw (building + testing those surfaces), drafts the evaluator as an EXTERNAL markdown rubric, has a DIVERSE expert panel (incl. an adversary) harden it, and calibrates it against a real anchor while hill-climbing self-consistency — then registers it so gepa can optimize the agent against it. Use whenever the user wants to optimize / improve / "make better" an agent's prompt or output and the way to MEASURE quality is missing or weak — e.g. "set up a judge for my extraction agent", "how do I score whether my summarizer is good", "optimize this prompt but I have no gold labels", "build an evaluator for <agent>", or BEFORE any `gepa run` whose metric is absent or untrustworthy. NOT for laying out the repo (that's gepa-init) or driving the optimization loop (that's gepa-run); this is only the build-the-metric step. The evaluator is the bottleneck on every optimization — do not skip building one well.
Drive a GEPA optimization run to automatic completion, stopping when a stop-policy condition is met (budget exhausted, corroborated saturation, or too many invalid patches). Use when the user says "autorun gepa", "run until done", "keep optimizing automatically", or "stop when saturated", AND a `.gepa/config.yaml` exists. Wraps gepa-run with a decide_stop check between every iteration.
Inspect a finished gepa-anywhere run's Pareto frontier and deliberately promote the winning candidate back onto the artifact. Use after `gepa run` reaches exit 0, when the user says "show the frontier", "which candidate won", "promote the winner", "apply the best prompt", or wants to compare candidates / check the holdout report before committing.
Lay out a repository to hill-climb (optimize) ARBITRARILY MANY agents with gepa, each with its own grounded evaluator — by hand-creating the `.gepa/agents/<name>/` directories + a registry convention. Use once per repo, before building evaluators or running optimizations, when the user says "I want to optimize several agents/prompts in this repo", "set up a multi-agent gepa project", "initialize gepa for many agents", or "add gepa to this project" AND more than one agent will be optimized. NOT for building the evaluator itself (that's evaluator-discovery) or running the loop (that's gepa-run). For a single prompt + a code metric, prefer `gepa scaffold` (the flat `.gepa/config.yaml` quickstart) — this skill is the multi-agent superset and produces a DIFFERENT, incompatible layout, so don't mix them in one repo.
Drive a GEPA reflective-optimization loop on any text artifact in any repo, using THIS Claude Code session as the free (Max-billed) reflection LM. The Python CLI `gepa run` handles GEPA's optimization math + checkpointing; when it needs the reflection LM it writes a pending envelope and exits 42 — you read it, propose improved artifact text, write the response, and re-invoke. Use when the user says "run gepa", "optimize this prompt/instructions", "evolve the artifact", "improve the extraction prompt", AND a `.gepa/config.yaml` exists (or they want one — then use gepa-scaffold first).
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Scaffold-into-any-repo GEPA reflective optimization for
any text artifact (a prompt, an instruction file, prompt fragments) — using the active
Claude Code session as the reflection LM (free, Max-billed), with host-supplied rollout +
metric hooks. A generalization of whiteboard/tools/whiteboard-gepa.
Status: M0–M4 built. Generic core (single-file and multi-component artifacts; command
and subagent rollout; command and subagent/LLM-judge metric; session and api
reflection) + scaffold + 3 skills. Validated end-to-end against two examples on the same core:
examples/cramer (real host, M3) and examples/whiteboard (multi-component generality proof,
runnable). See SPEC.md for the full design and the M0–M4 plan.
uv sync # one-time, in this repo
# in any repo:
gepa scaffold # drops .gepa/{config.yaml, rollout.sh, metric.py, golden/}
# ...point artifact.path at your file, implement the hooks, add a golden set...
gepa run --config .gepa/config.yaml # session drives the exit-42 reflection loop
gepa state --config .gepa/config.yaml # suspended | done | in-flight
gepa frontier --config .gepa/config.yaml # inspect the Pareto frontier; --promote applies the winner
gepa and gepa-anywhere are the same entry point. As a Claude Code plugin, the three skills
(gepa-scaffold, gepa-run, gepa-frontier) drive this conversationally — the session is the
reflection LM, so no API key is needed for the heavy LLM work.
core/ generic loop — config, dataset/splits, candidate I/O, command/subagent hooks,
ConfigDrivenGEPAAdapter, exit-42 suspend/resume, run lock, frontier, CLI
templates/ what `scaffold` drops into a host repo (mirrors core/scaffold.py constants)
skills/ gepa-scaffold | gepa-run | gepa-frontier
tests/ config/protocol/hook/adapter unit tests + a no-LLM end-to-end toy optimization
The generic core carries no host-domain strings (enforced by tests/test_nfr6_generic.py).
The harness is reusable; the metric and the golden set are the host's work — that's where the
quality of a run is decided.
uv run pytest -q # full suite (the toy run exercises the real gepa.optimize loop)
First validation target (M3): scaffold into ~/trading/cramer to optimize its extraction prompt
against a hand-labeled golden set (SPEC §9, M3).
npx claudepluginhub evanfabry/gepa-anywhere --plugin gepa-anywhereUltra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.