From harness
Use to execute the harness loop on a project that has a .harness/ folder. Triggers on "/run-harness", "run the harness", "kick off the harness on [X]", "resume the harness". You — Claude Code — are the orchestrator. Each role (planner, generator, evaluator) is a subagent invocation via the Task tool. State lives on disk under .harness/state/.
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness:run-harnessThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are the orchestrator. Each role is a fresh subagent invocation with its own context window. State communication is through JSON files on disk. You do not roleplay the planner, generator, or evaluator — you dispatch them and read what they wrote.
You are the orchestrator. Each role is a fresh subagent invocation with its own context window. State communication is through JSON files on disk. You do not roleplay the planner, generator, or evaluator — you dispatch them and read what they wrote.
iteration_start event in state/progress.jsonl..harness/config.json exists. If not, stop and suggest /instantiate-harness..harness/calibration/ is non-empty. If it has no examples (only README.md), warn loudly: evaluator quality without calibration is poor (principle P6). Ask before proceeding.~/.claude/plugins/harness/ (or wherever installed); resolve relative paths from there.Read .harness/config.json into memory. Note models.{planner,generator,evaluator}, budget, pivot.consecutive_failure_threshold, pivot.plateau_threshold. Log iteration_start is not for setup — that comes per iteration.
If state/plan.json does not exist OR the user did not pass --resume:
Task {
subagent_type: general-purpose,
model: <config.models.planner>,
description: "Plan harness scope",
prompt: """
You are the planner. Read your system prompt at .harness/planner.md and follow it.
Your output: write state/plan.json matching runtime/schemas/plan.schema.json.
The user prompt: <config.user_prompt>
The rubric is at .harness/rubric.json.
Do not produce more than 8 deliverables. Do not pick stack/tools/structure.
When done, print "PLAN_WRITTEN" and the path.
"""
}
Validate the written state/plan.json against the schema (call python <plugin-root>/runtime/cli/validate.py plan state/plan.json). Append a plan_written line to state/progress.jsonl.
For iteration N from 1 to budget.max_iterations:
Append {"t":..., "event":"iteration_start", "iter": N} to state/progress.jsonl.
Dispatch the generator subagent to propose a contract:
Task {
description: "Propose contract iter N",
prompt: """
You are the generator. Read your system prompt at .harness/generator.md and follow it.
Read state/plan.json and .harness/rubric.json.
Propose a contract for iteration N. Write state/contracts/iter-{N:03d}.json
matching runtime/schemas/contract.schema.json. Do NOT build yet.
When done, print "CONTRACT_PROPOSED" and the path.
"""
}
Then dispatch the evaluator subagent to critique:
Task {
description: "Critique contract iter N",
prompt: """
You are the evaluator. Read your system prompt at .harness/evaluator.md and follow it.
Read state/contracts/iter-{N:03d}.json. Either approve it or critique it.
If approve: set "agreed_by": ["generator","evaluator"] and "agreed_at": <ISO>
in the same file and write it back. Print "CONTRACT_AGREED".
If critique: write state/handoffs/handoff-{seq}.json with kind "contract_critique"
and the specific issues. Print "CONTRACT_CRITIQUED".
"""
}
If critiqued, dispatch generator again to revise. Cap at 3 negotiation rounds. After 3, force-agree with a log note.
Append contract_agreed to progress.jsonl.
Task {
description: "Build artifact iter N",
prompt: """
You are the generator. Read your system prompt at .harness/generator.md and follow it.
Read state/contracts/iter-{N:03d}.json (the agreed contract).
Build the artifact. When done, write state/handoffs/handoff-{seq}.json with
kind "artifact_ready" and a per-criterion self-check (HINT only, not verdict).
Print "ARTIFACT_READY" and the artifact path(s).
"""
}
Append artifact_ready to progress.jsonl.
Task {
description: "Grade iter N",
prompt: """
You are the evaluator. Read your system prompt at .harness/evaluator.md and follow it.
Read state/contracts/iter-{N:03d}.json, .harness/rubric.json, and the calibration
examples at .harness/calibration/.
Operate the artifact — run every verification action listed in the contract.
Do not grade from description.
Write state/verdicts/verdict-{N:03d}.json matching runtime/schemas/verdict.schema.json.
Print "VERDICT_WRITTEN" and iteration_pass.
"""
}
Validate verdict against schema. Append verdict line to progress.jsonl with iter, pass, and failing criterion IDs.
Run the pivot check:
python <plugin-root>/runtime/cli/check_pivot.py --project . --iteration N
The script reads recent verdicts from state/verdicts/ and returns one of:
PASS → log halt_success, exitITERATE → continue to iteration N+1PIVOT <criterion> → archive iteration N, log pivot, continue to N+1 with fresh-context instructionBUDGET → log halt_budget, exitIf PIVOT: dispatch generator at N+1 with explicit "previous attempt archived under state/archive/iter-{N:03d}; take a different approach" in the prompt.
At meaningful events (verdict, pivot, halt), print a one-line summary. Don't surface every handoff — surface the structural events only. At halt, print final artifact path(s) and the line count from progress.jsonl.
/run-harness — planner runs, then iterate./run-harness --resume — read last_event from progress.jsonl, start at next iteration./run-harness --iteration N — re-run only that iteration (useful for debugging a specific failure after editing a prompt)./run-harness --dry — python runtime/cli/validate.py config .harness/config.json && python runtime/cli/validate.py rubric .harness/rubric.json — no role dispatch.progress.jsonl reflects every meaningful event in real time.consecutive_failure_threshold times — not because of vibes.Each subagent invocation gets its own context window. The evaluator never sees the generator's reasoning, only what's on disk. That's the adversarial pressure principle P2 made concrete. The file system is the message bus — not your conversation context.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub kamilseghrouchni/harness --plugin harness