From agent-harness
Plan, execute, and review multi-agent harnesses using the Planner-Generator-Evaluator pattern (Anthropic Engineering, 2026-04-04). Use when a task exceeds one context window, benefits from parallel subagents, needs Playwright MCP browser-level verification, or requires sprint-contract negotiation between Generator and Evaluator. Also use when routing work across Opus (orchestration/eval) / Sonnet (generation) / Haiku (collection), deciding spawn-vs-inline-vs-Agent-Teams, designing context reset vs compaction handoffs, or reviewing whether a knowledge unit belongs in a hook, skill, rule, reference, or prompt. Do NOT use for: single-file edits, simple Q&A, or work that /sprint already templates end-to-end.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent-harness:harness-engineering [task description, design question, or skill/hook/rule under review][task description, design question, or skill/hook/rule under review]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
A flexible framework for the Planner-Generator-Evaluator (P-G-E) design pattern as
A flexible framework for the Planner-Generator-Evaluator (P-G-E) design pattern as
formalized by Anthropic Engineering on 2026-04-04. Where /sprint is a fixed 6-phase
pipeline producing .sprint/<ts>/ artifacts, this skill helps you reason about,
construct, or review harnesses that don't fit a rigid template — including the
harness elements (skills/rules/hooks/references) that surround them.
Every component in a harness encodes an assumption about what the model can't do on its own — and those assumptions are worth stress-testing. (Anthropic, 2026-04-04)
The harness is durable goods; the model is a replaceable engine. As models strengthen, remove non-load-bearing scaffolding. As failure modes appear, add elements that survive model swaps (Hooks > Skills > Rules > References > Prompts on the durability axis).
$ARGUMENTS
If empty: ask whether the user wants (a) execution of a P-G-E task, (b) design review of an existing skill/rule/hook, (c) routing/decomposition advice, or (d) diagnosis of a failed harness run — then proceed.
State the chosen mode in one line before proceeding. Do not collapse modes — they have different downstream procedures.
| Mode | Trigger | Goes to |
|---|---|---|
| Execute | "do X using multi-agent / subagents / P-G-E" | Step 2 |
| Design | "should this be a hook / skill / rule?" or "review this skill" | Step 6 |
| Route | "which model should handle X?" or "spawn subagent or inline?" | Step 5 |
| Diagnose | "why did the harness fail?" / context drift / stale handoff | Step 7 |
Apply Planner discipline. The Planner's job is to expand a brief prompt (1–4 sentences) into a detailed spec — not to write code. Output:
parallel_batch; with deps go to
sequential_tasks.If the goal is genuinely a single task, report it and skip P-G-E — do not force-fit.
Before any Generator runs, produce a sprint-contract.md with:
Generators read this contract; Evaluators grade against it. The contract is the schema between roles — without it, Self-Evaluation Weakness creeps back in.
See references/model-routing.md. Quick rule:
| Work | Model |
|---|---|
| Plan, evaluate, architectural judgment | Opus |
| Code, write, research synthesize | Sonnet |
| Collect, transform, fetch | Haiku |
Override only with reason in the plan.
See references/subagent-decision.md. Summary:
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1Verify Agent Teams via echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS before assuming
parallelism. Empty → fall back to sequential and note it.
Execute mode: spawn the chosen agents with full cold-start prompts (embed context, don't pass file paths alone). After all return, run Evaluator (yourself or fresh subagent) against the sprint contract. If FAIL and budget remains, return to Step 2 with only failed tasks. Cap at 3 iterations.
Design mode: read references/durability.md and apply the harness-element
decision tree. Output the recommendation with the rule from the tree that justifies it.
Route mode: answer with the model + one-sentence reason, then stop.
If diagnosing a failed harness run, read references/anti-patterns.md and match the
symptoms to one of the canonical failure modes:
Report the matched mode and the prescribed fix from anti-patterns.md.
sprint-contract.md criteria, not against vibesecho $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS; empty → sequential fallback (note it, don't abort)collect (fetch/transform), never for research (synthesize)/sprint is one template built on this framework; if the task is batch-implementation-shaped, prefer /sprint over re-deriving the pipelinereferences/pattern.md — Anthropic 2026-04-04 P-G-E architecture: roles, sprint contracts, Playwright MCP, context reset vs compactionreferences/model-routing.md — Opus/Sonnet/Haiku routing with cost reasoningreferences/subagent-decision.md — Inline vs subagent vs Agent Teams decision tree, OODA-loop framingreferences/durability.md — Harness Engineering philosophy, 5-Layer architecture, hook/skill/rule/reference decision treereferences/anti-patterns.md — 9 canonical harness failure modes with matched fixesRead references on demand. Do not preload them all at the start of every invocation.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub bouob/claude-plugins --plugin agent-harness