gepa
A CLI tool for LLM-guided optimization of any text artifact — prompts, code, configs, agent architectures — inspired by GEPA's optimize_anything API.
If you can write an evaluator for it, gepa can optimize it.
How It Works
You provide:
- A seed artifact (or describe what you want in natural language)
- An evaluator — a shell command that scores the artifact (0.0–1.0)
gepa runs an iterative loop: evaluate → pass score + diagnostics to an LLM → LLM proposes an improvement → repeat. It tracks the best candidate across all iterations and uses a Pareto frontier to preserve diverse strengths.
The LLM is driven by the Claude Code CLI (claude).
Three Optimization Modes
| Mode | Flags | Description |
|---|
| Single-task | --seed + --evaluator | Solve one hard problem. The artifact is the solution. |
| Multi-task | + --dataset | Solve a batch of related problems with cross-transfer. |
| Generalization | + --dataset --valset | Build a skill that transfers to unseen problems. |
Installation
go install github.com/jonniesweb/gepa-cli/cmd/gepa@latest
Or build from source:
git clone https://github.com/jonniesweb/gepa-cli
cd gepa
go build -o gepa ./cmd/gepa
Requirements: Claude Code CLI must be installed and on your PATH.
Usage
gepa optimize [options]
Options:
--seed <file> Starting artifact (file path, - for stdin, or omit for seedless mode)
--objective <text> What to optimize for (natural language); required in seedless mode
--background <text> Domain knowledge and constraints for the LLM
--evaluator <cmd> Shell command that scores a candidate (required)
--dataset <file> JSON file: array of training examples (activates multi-task mode)
--valset <file> JSON file: array of validation examples (activates generalization mode)
--max-calls <n> Max evaluator calls across all iterations (default: 50)
--output <file> Write best candidate to file (default: stdout)
--engine <name> AI engine: claude (default)
-v, --verbose Verbose output (show LLM prompts and full responses)
-h, --help Show this help
Evaluator Protocol
The evaluator is run as a shell command with these environment variables:
| Variable | Value |
|---|
GEPA_CANDIDATE_FILE | Path to a temp file containing the candidate |
GEPA_CANDIDATE | Candidate text (full content) |
GEPA_EXAMPLE | JSON-encoded example object (empty if single-task) |
GEPA_ITERATION | Current iteration number |
The evaluator prints to stdout:
{"score": 0.85, "diagnostics": {"Error": "...", "Output": "..."}}
Or just a plain float:
0.85
Higher scores are better (0.0–1.0 recommended). The diagnostics map is passed to the LLM as Actionable Side Information (ASI) — diagnostic feedback the LLM reads to understand failures and propose targeted improvements.
Examples
Single-task: Optimize a Shell Script
# evaluator: runs the script and scores it by runtime
cat > bench.sh << 'EOF'
#!/bin/bash
t=$(date +%s%N)
sh "$GEPA_CANDIDATE_FILE"
elapsed=$(( ($(date +%s%N) - t) / 1000000 ))
score=$(python3 -c "print(max(0, 1 - $elapsed / 5000))")
echo "{\"score\": $score, \"diagnostics\": {\"Runtime\": \"${elapsed}ms\"}}"
EOF
chmod +x bench.sh
gepa optimize \
--seed slow_script.sh \
--objective "Make this script faster" \
--evaluator ./bench.sh \
--max-calls 20 \
--output fast_script.sh
Multi-task: Optimize a Prompt Across Examples
# dataset: array of {input, expected_output} objects
gepa optimize \
--seed prompt.txt \
--objective "Improve accuracy on these reasoning tasks" \
--evaluator 'python3 score_prompt.py' \
--dataset train.json \
--max-calls 40 \
--output best_prompt.txt
Generalization: Learn an Agent Skill
gepa optimize \
--seed skill.md \
--objective "Optimize this coding agent skill for the bleve repository" \
--evaluator './run_agent_eval.sh' \
--dataset train_tasks.json \
--valset val_tasks.json \
--max-calls 100 \
--output optimized_skill.md
Seedless Mode: Generate From Scratch
gepa optimize \
--objective "Write a Python function that reverses a string in O(n)" \
--evaluator 'python3 test_reverse.py' \
--max-calls 10 \
--output reverse.py
Inspiration
This tool is a CLI implementation of the ideas from GEPA's optimize_anything: