EPOCH
Multi-round optimization framework that runs iterative optimize-evaluate loops: investigate failures, implement a fix, evaluate the result, accept or reject with evidence, repeat. Works with both Claude Code and OpenAI Codex.
Installation
Claude Code
claude plugin marketplace add zhanlin-liu/EPOCH
claude plugin install epoch@zhanlin-liu-EPOCH
Then use /epoch in any project.
OpenAI Codex
$skill-installer install https://github.com/zhanlin-liu/EPOCH/tree/main/skills/epoch
Restart Codex after installation, then use /epoch.
From Source
git clone https://github.com/zhanlin-liu/EPOCH.git
cd EPOCH
claude # or codex
Quick Start
Option 1: Guided Setup (No Config)
/epoch
EPOCH will interview you (3–5 questions) and scaffold the project automatically.
Option 2: Existing Config
/epoch projects/my_project_run.yaml
EPOCH reads the config, sets up the environment, and starts the optimization loop.
Supported Task Types
| Task Type | What It Optimizes | Example |
|---|
prompt_tune | LLM prompt text | Sentiment classification with GPT-4.1-nano |
finetune | ML hyperparameters | MobileNetV2 learning rate and optimizer |
rule_based | Rule conditions and thresholds | Iris species classifier |
code_improvement | Algorithm correctness and performance | Fibonacci calculator speed |
How It Works
Round 1: Baseline
└─ Scaffold project → Run evaluation → Save baseline metrics → Open PR
Round 2+: Optimization Loop
└─ Investigate failures (TRAIN only)
└─ Implement fix
└─ Evaluate (EVAL only)
├─ ACCEPT → Merge PR → Next round
└─ REJECT → Retry or next round
Each round produces:
- A git branch (
epoch/<slug>/<run-id>/round-N)
- A pull request with metrics tables and evidence
- Structured JSON artifacts (
baseline_metrics.json, delta_round_N.json)
Plugin Structure
.claude-plugin/
└── plugin.json
skills/
└── epoch/
├── SKILL.md # Entry point — dispatches by task_type
├── agents/ # Role definitions
│ ├── orchestrator.md
│ ├── investigator.md
│ ├── executor.md
│ ├── baseline_executor.md
│ ├── seed_planner.md
│ └── reviewer.md
└── references/ # Task-type workflows
├── prompt_tune.md
├── finetune.md
├── rule_based.md
├── code_improvement.md
├── create_project.md
└── create_skill.md
Writing a Config
Create projects/<slug>_run.yaml. See projects/ for example configs.
ML Tasks (prompt_tune, finetune, rule_based)
project:
name: "My Classifier"
slug: my_classifier
run:
id: null # auto-generated at runtime
goal: "Optimize classification accuracy"
task_type: rule_based # or prompt_tune, finetune
max_rounds: 5
max_retries_per_round: 1
env:
manager: uv
path: "projects/my_classifier"
evaluation:
primary_metric: accuracy
min_delta: 0.01 # minimum improvement to accept a round
deterministic: true
train_cmd: "python projects/my_classifier/evaluate.py train"
eval_cmd: "python projects/my_classifier/evaluate.py eval"
git:
push_to_remote: true
create_prs: true
target_branch: develop
Code Improvement
project:
name: "My Algorithm"
slug: my_algo
run:
task_type: code_improvement
max_rounds: 5
max_retries_per_round: 1
env:
manager: uv
path: "projects/my_algo"
evaluation:
primary_metric: execution_time
min_delta: 0.05 # 5% minimum speedup per round
deterministic: true
cmd: "python projects/my_algo/main.py"
test_cmd: "pytest projects/my_algo/tests/"
git:
push_to_remote: true
create_prs: true
target_branch: develop
Hyperparameter Tuning (finetune)
Add ml and tune sections to control the search space:
ml:
base_model: mobilenetv2
framework: pytorch
seed: 42
max_train_epochs: 3
tune:
optimizer: [adam, adamw, sgd] # discrete options
learning_rate: [0.0001, 0.01] # continuous range [min, max]
Prompt Tuning (prompt_tune)
Add llm and tune sections:
llm:
model: "gpt-4.1-nano"
async: true
concurrency: 12
tune:
strategy: ["few shots", "chain of thought"]
Key Concepts
Train/Eval Separation
For ML tasks, EPOCH enforces strict separation:
- Investigation: Analyzes failures on TRAIN split only
- Evaluation: Accepts/rejects based on EVAL split only
- This prevents overfitting to the evaluation set
For code_improvement: all tests are visible (no split needed).
Acceptance Criteria