gepa

A CLI tool for LLM-guided optimization of any text artifact — prompts, code, configs, agent architectures — inspired by GEPA's optimize_anything API.

If you can write an evaluator for it, gepa can optimize it.

How It Works

You provide:

A seed artifact (or describe what you want in natural language)
An evaluator — a shell command that scores the artifact (0.0–1.0)

gepa runs an iterative loop: evaluate → pass score + diagnostics to an LLM → LLM proposes an improvement → repeat. It tracks the best candidate across all iterations and uses a Pareto frontier to preserve diverse strengths.

The LLM is driven by the Claude Code CLI (claude).

Three Optimization Modes

Mode	Flags	Description
Single-task	`--seed` + `--evaluator`	Solve one hard problem. The artifact is the solution.
Multi-task	`+ --dataset`	Solve a batch of related problems with cross-transfer.
Generalization	`+ --dataset --valset`	Build a skill that transfers to unseen problems.

Installation

go install github.com/jonniesweb/gepa-cli/cmd/gepa@latest

Or build from source:

git clone https://github.com/jonniesweb/gepa-cli
cd gepa
go build -o gepa ./cmd/gepa

Requirements: Claude Code CLI must be installed and on your PATH.

Usage

gepa optimize [options]

Options:
  --seed <file>         Starting artifact (file path, - for stdin, or omit for seedless mode)
  --objective <text>    What to optimize for (natural language); required in seedless mode
  --background <text>   Domain knowledge and constraints for the LLM
  --evaluator <cmd>     Shell command that scores a candidate (required)
  --dataset <file>      JSON file: array of training examples (activates multi-task mode)
  --valset <file>       JSON file: array of validation examples (activates generalization mode)
  --max-calls <n>       Max evaluator calls across all iterations (default: 50)
  --output <file>       Write best candidate to file (default: stdout)
  --engine <name>       AI engine: claude (default)
  -v, --verbose         Verbose output (show LLM prompts and full responses)
  -h, --help            Show this help

Evaluator Protocol

The evaluator is run as a shell command with these environment variables:

Variable	Value
`GEPA_CANDIDATE_FILE`	Path to a temp file containing the candidate
`GEPA_CANDIDATE`	Candidate text (full content)
`GEPA_EXAMPLE`	JSON-encoded example object (empty if single-task)
`GEPA_ITERATION`	Current iteration number

The evaluator prints to stdout:

{"score": 0.85, "diagnostics": {"Error": "...", "Output": "..."}}

Or just a plain float:

0.85

Higher scores are better (0.0–1.0 recommended). The diagnostics map is passed to the LLM as Actionable Side Information (ASI) — diagnostic feedback the LLM reads to understand failures and propose targeted improvements.

Examples

Single-task: Optimize a Shell Script

# evaluator: runs the script and scores it by runtime
cat > bench.sh << 'EOF'
#!/bin/bash
t=$(date +%s%N)
sh "$GEPA_CANDIDATE_FILE"
elapsed=$(( ($(date +%s%N) - t) / 1000000 ))
score=$(python3 -c "print(max(0, 1 - $elapsed / 5000))")
echo "{\"score\": $score, \"diagnostics\": {\"Runtime\": \"${elapsed}ms\"}}"
EOF
chmod +x bench.sh

gepa optimize \
  --seed slow_script.sh \
  --objective "Make this script faster" \
  --evaluator ./bench.sh \
  --max-calls 20 \
  --output fast_script.sh

Multi-task: Optimize a Prompt Across Examples

# dataset: array of {input, expected_output} objects
gepa optimize \
  --seed prompt.txt \
  --objective "Improve accuracy on these reasoning tasks" \
  --evaluator 'python3 score_prompt.py' \
  --dataset train.json \
  --max-calls 40 \
  --output best_prompt.txt

Generalization: Learn an Agent Skill

gepa optimize \
  --seed skill.md \
  --objective "Optimize this coding agent skill for the bleve repository" \
  --evaluator './run_agent_eval.sh' \
  --dataset train_tasks.json \
  --valset val_tasks.json \
  --max-calls 100 \
  --output optimized_skill.md

Seedless Mode: Generate From Scratch

gepa optimize \
  --objective "Write a Python function that reverses a string in O(n)" \
  --evaluator 'python3 test_reverse.py' \
  --max-calls 10 \
  --output reverse.py

Inspiration

This tool is a CLI implementation of the ideas from GEPA's optimize_anything:

gepa

Popularity

What's Inside

README

gepa

How It Works

Three Optimization Modes

Installation

Usage

Evaluator Protocol

Examples

Single-task: Optimize a Shell Script

Multi-task: Optimize a Prompt Across Examples

Generalization: Learn an Agent Skill

Seedless Mode: Generate From Scratch

Inspiration

Confidence

Similar Plugins

caveman

frontend-design

ui-design

claude-mem

marketing-skills

nanobanana

More by jonniesweb

worktree

beads-workflow

cxdb-logger

retro

Popularity

Health & Quality

More by jonniesweb

worktree

beads-workflow

cxdb-logger

retro

Similar Plugins

caveman

frontend-design

ui-design

claude-mem

marketing-skills

nanobanana