From epoch
Multi-round optimization loop for improving prompts, rules, hyperparameters, and code through evidence-based iteration. Use this skill when the user mentions "epoch" or invokes /epoch. Reads a epoch_run.yaml config to determine task type and dispatches to the appropriate workflow.
How this skill is triggered — by the user, by Claude, or both
Slash command
/epoch:epochThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
EPOCH runs an iterative optimization loop: Investigate failures, implement a fix, evaluate the result, accept or reject with evidence, repeat.
EPOCH runs an iterative optimization loop: Investigate failures, implement a fix, evaluate the result, accept or reject with evidence, repeat.
/epochepoch_run.yaml config fileConfig provided (e.g. /epoch projects/wine_run.yaml):
epoch_run.yamlenv is configured, ensure the environment is set up (e.g. uv sync in env.path). Wrap all commands (evaluation.cmd, evaluation.test_cmd, evaluation.train_cmd/evaluation.eval_cmd) with the env manager (e.g. uv run --project <env.path> <cmd>)task_type from configNo config (e.g. /epoch or "epoch, classify wine cultivars"):
references/create_project.mdprojects/<slug>_run.yaml + scaffold project files using ONLY the templates in create_project.md — do NOT read or reference other project folders, and do NOT check git log or git historyRead epoch_run.yaml and dispatch based on run.task_type:
| task_type | Reference to Load | Description |
|---|---|---|
prompt_tune | references/prompt_tune.md | Optimize LLM prompts |
finetune | references/finetune.md | Tune ML hyperparameters |
rule_based | references/rule_based.md | Optimize rule-based systems |
code_improvement | references/code_improvement.md | Fix bugs, optimize performance |
If no config is provided, load references/create_project.md to interview the user and scaffold the project.
If task_type is not recognized, load references/create_skill.md to generate a new task-type reference.
Agents define role behavior, permissions, and constraints. Load them as needed during the workflow.
| Agent | When to Load | Purpose |
|---|---|---|
agents/orchestrator.md | Always | Coordinates rounds, manages branches and PRs |
agents/seed_planner.md | Round 1 | Designs baseline evaluation approach |
agents/baseline_executor.md | Round 1 | Implements evaluation infrastructure |
agents/investigator.md | Rounds 2+ | Analyzes failures, proposes changes |
agents/executor.md | Rounds 2+ | Implements changes, commits |
agents/reviewer.md | Rounds 2+ | Evaluates results, accepts/rejects |
Users may add custom agents or exclude agents from this list based on their needs.
These apply across all task types.
run-YYYYMMDD-HHMMepoch/<project_slug>/<run_id>/round-<N>Config and task files are organized as:
projects/
├── <slug>_run.yaml # epoch config (sibling to task folder)
└── <slug>/ # task folder
├── evaluate.py # ML tasks: train/eval metric runner
├── tests/ # code_improvement: test suite
├── rules/ # or other task-specific files
└── <run_id>/ # run artifacts
├── baseline_metrics.json
├── proposed_metrics.json
├── delta_round_N.json
├── pr_body.md
└── run_summary.md
Never write outputs to the repository root.
[Round 1] (Baseline: <metric>=<value>) Initial <artifact>[Round N] (<metric>: <old> -> <new>) <brief summary>A round is accepted when:
min_delta (from config)Every rejection must include:
No subjective rejections. "Doesn't seem right" is not valid.
When a round is rejected and retries remain:
For ML tasks (prompt_tune, finetune, rule_based):
For code_improvement: All tests are visible (no split).
agents/seed_planner.md — design evaluation approachagents/baseline_executor.md — implement and run baselinebaseline_metrics.jsonFor each round:
agents/investigator.md + task reference — analyze failures on TRAIN, propose changesagents/executor.md — apply changes, commit, pushagents/reviewer.md — run EVAL, compare metrics, decide accept/rejectAfter all rounds, generate a run summary with the full metrics progression.
Each project needs a epoch_run.yaml. The config structure differs by task type:
ML tasks (prompt_tune, finetune, rule_based) use evaluation: with train/eval split:
project:
name: "Project Name"
slug: "project_slug"
run:
task_type: "rule_based" # prompt_tune | finetune | rule_based
max_rounds: 10
max_retries_per_round: 2
env:
manager: uv
path: "projects/<slug>"
evaluation:
primary_metric: "precision"
min_delta: 0.01
deterministic: true
train_cmd: "python projects/<slug>/evaluate.py train"
eval_cmd: "python projects/<slug>/evaluate.py eval"
git:
push_to_remote: true
create_prs: true
target_branch: "develop"
Code improvement uses evaluation: with cmd (the program under test) and test_cmd (the test runner):
project:
name: "Project Name"
slug: "project_slug"
run:
task_type: "code_improvement"
max_rounds: 5
max_retries_per_round: 1
env:
manager: uv
path: "projects/<slug>"
evaluation:
primary_metric: "execution_time"
min_delta: 0.05
deterministic: true
cmd: "python projects/<slug>/main.py"
test_cmd: "pytest projects/<slug>/tests/"
git:
push_to_remote: true
create_prs: true
target_branch: "develop"
Task-specific config sections (llm:, ml:, rules:) are documented in the corresponding reference file.
git log, git show, or any git history commands to look at past runs or projects. Start fresh from the templates and the user's input only.projects/<slug>/ and projects/<slug>_run.yaml. Never scan, read, or reference other project folders — not even to "check patterns" or "follow conventions". Each project is scaffolded from the templates in create_project.md, not copied from siblings.npx claudepluginhub zhanlin-liu/epoch --plugin epochSets up Karpathy-style autoresearch experiments to autonomously optimize code in one constrained file via iterative evals against a numerical metric, generating instructions.md, eval script, test data, and launch prompt.
Runs autonomous experiment loops to iteratively optimize measurable metrics like code performance, ML loss, build size via git branches, code changes, verify commands, and guards.
Runs iterative experiments to optimize measurable metrics (speed, accuracy, config). Manages .lab/ directory for experiment history and autonomous workflow.