Skill

harness

BELCORT Planner → Generator → Evaluator pipeline. Invoke when the user runs a /harness:* slash command, when .harness/manifest.yaml is present, or when the user describes a substantial build task (3+ components, >15 minutes of work) and has not yet activated the harness. The procedure for each command lives in commands/*.md — this skill is the shared context: activation rules, agent communication protocol, subagent isolation, and TDD contract.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/harness:harness

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A Planner → Generator → Evaluator pipeline for Claude Code, adapted from Anthropic's research on long-running agent harnesses. Three fresh subagents, each with clean context, communicating through files in `.harness/`.

SKILL.md

178 lines · ~2.5k tokens

Stats

Parent stars0

MaintenanceFair

Last CommitApr 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

BELCORT Harness Engine

This skill is a reference manual, not a procedure. Each slash command has its own self-contained procedure in commands/*.md. This file holds the cross-cutting contracts every command depends on.

Activation

This skill is live when any of the following are true:

The SessionStart hook detected .harness/manifest.yaml in the current directory (the hook injected <harness-state> into context)
The user ran /harness:sprint, /harness:quick, /harness:resume, or any other /harness:* command — the command file in commands/ is your entry point; this skill is the shared context
The user described something to build with ≥1% chance the harness would help (the 1% rule in ~/.claude/CLAUDE.md)

SUBAGENT ESCAPE HATCH

If you were dispatched as a subagent — your prompt contains a <SUBAGENT-CONTEXT> block — SKIP this skill entirely. Subagents do one specific job (planning, generating, evaluating). They do NOT orchestrate the pipeline, do NOT re-invoke /harness:* commands, do NOT read this SKILL.md for procedure. The orchestrator is the only agent that reads this skill.

Commands

Each procedure lives in its own file under commands/. This is a pointer table, not the procedure itself.

Command	File	What it does
`/harness:sprint "<prompt>"`	commands/sprint.md	Full pipeline: plan → analyze → [human gate] → negotiate → build → evaluate → tuning → retrospect → merge
`/harness:quick "<prompt>"`	commands/quick.md	Fast: skip Planner, minimal contract, single build+QA pass
`/harness:resume`	commands/resume.md	Recover from any phase using `manifest.yaml` + `changelog.md`
`/harness:analyze`	commands/analyze.md	Cross-artifact consistency check (PRD ↔ architecture ↔ contract)
`/harness:negotiate`	commands/negotiate.md	Generator ↔ Evaluator contract negotiation (pre-build)
`/harness:validate`	commands/validate.md	13-point quality audit on existing spec files
`/harness:edit "<change>"`	commands/edit.md	Targeted spec edit with downstream propagation
`/harness:retrospective`	commands/retrospective.md	Post-merge drift analysis + spec sync
`/harness:tune-evaluator`	commands/tune-evaluator.md	Review divergence log, propose calibration updates
`/harness:audit`	commands/audit.md	Verification debt scan
`/harness:setup`	commands/setup.md	One-time install of harness rules into `~/.claude/CLAUDE.md`

Subagent Isolation Protocol

This is the GAN insight from the Anthropic harness research: the agent judging the work must have separate context from the agent doing the work. Violations invalidate the whole pipeline.

Rules every command follows:

Each agent runs as a fresh subagent via claude -p "..." — not a nested Claude conversation.
The Evaluator MUST NEVER share context with the Generator. Always dispatch as separate processes.
Every dispatch prompt includes a <SUBAGENT-CONTEXT> block telling the subagent: you were dispatched for ONE job; do NOT re-invoke the harness pipeline; if SessionStart or SKILL.md fires in your context, SKIP IT.
Subagents write their output to .harness/features/<current>/<filename>.md, the orchestrator reads those files — never back-channel via conversation.

Agent Communication Protocol

Agents NEVER share conversation context. They communicate exclusively via .harness/ files. Each feature has its own folder under .harness/features/NNN-name/ for scoped artifacts.

GLOBAL files (persist across features):
  .harness/spec/          — prd.md, architecture.md, constitution.md, evaluator-notes.md (optional)
  .harness/evaluator/     — criteria.md (grading rubric), examples.md, tuning-log.md
  .harness/ROADMAP.md     — shipped / in-progress / planned / considered
  .harness/manifest.yaml  — current state + feature tracking
  .harness/progress/      — changelog.md, decisions.md (ADRs), known-issues.md (append-only)

PER-FEATURE files (isolated in .harness/features/NNN-name/):
  contract.md              — what this feature builds (Planner writes draft, negotiate finalizes)
  proposal.md              — Generator's implementation plan (written during negotiate)
  review.md                — Evaluator's review of the proposal (written during negotiate)
  implementation-report.md — what was built (Generator writes)
  eval-report.md           — PASS/FAIL verdict (Evaluator writes)
  analysis-report.md       — cross-artifact check (optional, from /harness:analyze)
  retrospective.md         — drift analysis (optional, from /harness:retrospective)

Flow

Planner      → writes → spec/, evaluator/criteria.md,
                        features/NNN/contract.md (DRAFT),
                        ROADMAP.md (in-progress entry), manifest.yaml
[analyze]    → reads  → spec/, features/NNN/contract.md
             → writes → features/NNN/analysis-report.md
[negotiate]  → Generator writes features/NNN/proposal.md
               Evaluator writes features/NNN/review.md
               (iterate up to 3 rounds)
               Generator writes features/NNN/contract.md (FINAL, overwrites draft)
Generator    → reads  → spec/, features/NNN/contract.md (final)
             → writes → source code, git commits, progress/changelog.md,
                        features/NNN/implementation-report.md
Evaluator    → reads  → evaluator/criteria.md, evaluator/examples.md,
                        spec/evaluator-notes.md (if exists),
                        features/NNN/implementation-report.md,
                        features/NNN/contract.md (final), spec/
             → writes → features/NNN/eval-report.md
[retro]      → reads  → features/NNN/* + source code
             → writes → features/NNN/retrospective.md, spec updates, ROADMAP.md

On retry:
Generator    → reads  → features/NNN/eval-report.md (what failed)
             → writes → fixes + updated implementation-report.md

Tuning (after every evaluation, PASS or FAIL):
[orchestrator asks human]
[if divergence]
Orchestrator → writes → .harness/evaluator/tuning-log.md (append)
             → writes → .harness/evaluator/examples.md (if calibration example agreed)
             → writes → .harness/spec/evaluator-notes.md (if project-specific)

Periodic / on pattern detection:
[/harness:tune-evaluator]
Orchestrator → reads  → .harness/evaluator/tuning-log.md
             → writes → .harness/evaluator/examples.md (new examples)
                      → ${CLAUDE_PLUGIN_ROOT:-$HOME/.claude}/agents/evaluator.md (prompt changes, rare)
                      → .harness/progress/decisions.md (ADR for any prompt change)

TDD Protocol (Generator enforces)

Every FR/AC in the contract follows this cycle:

RED — write the failing test first, run it, confirm it fails
GREEN — write the minimum code to make it pass
REFACTOR — clean up with tests staying green
COMMIT — atomic: [harness:build] <behavior description>

No code without a failing test. No exceptions. The Evaluator verifies test-first discipline via git log archaeology.

Session Start Behavior

When starting a new Claude Code session in any project:

IF .harness/manifest.yaml exists:
  The SessionStart hook has already injected <harness-state> with phase + current feature.
  IF phase ≠ "complete":
    Mention: "This project has harness state (phase: [X]).
              Run /harness:resume to continue."

Don't auto-resume — just notify. The user may want to do something else first.

Evaluator Criteria (4 gradable dimensions)

Criterion	Threshold	How tested
Functionality	6/10	Playwright: exercise all flows + edge cases
Code Quality	6/10	Source review against constitution
Test Coverage	6/10	Run suite + check TDD evidence in git log
Product Depth	5/10	Use app as real user, try to break it

ANY criterion below threshold = FAIL → Generator retries with feedback.

Manifest schema

harness: { version: "1.2", model: "claude-opus-4-6", model_tuning_revision: 0 }
project: { name: "", description: "" }
config: { max_retries: 3, max_negotiation_rounds: 3, testing: { unit: vitest, e2e: playwright } }
state:
  phase: planning|analyzing|negotiating|building|evaluating|retrospective|complete
  current_feature: "001-feature-name"
  current_task: "FR-005"
  negotiation_round: 0
  retry_count: 0
features: { completed: [], in_progress: "", planned: [] }
verification_debt: { deferred: [], pending_human: [] }
tuning_debt:
  unreviewed_divergences: 0
  patterns_pending: []

Rules

If .harness/manifest.yaml exists, read it before doing anything else.
Evaluator is ALWAYS a separate subagent — never self-evaluate in the same context.
Every technical decision must align with .harness/spec/constitution.md.
Agents communicate via .harness/ files, never via shared context.
When uncertain, ask the human ONE focused question.
Planner does NOT specify files, components, data models, or API paths — those are negotiated between Generator and Evaluator before building.

harness

Invocation

Context Preview

SKILL.md

harness

Invocation

Context Preview

SKILL.md

BELCORT Harness Engine

Activation

SUBAGENT ESCAPE HATCH

Commands

Subagent Isolation Protocol

Agent Communication Protocol

Flow

TDD Protocol (Generator enforces)

Session Start Behavior

Evaluator Criteria (4 gradable dimensions)

Manifest schema

Rules

Similar Skills

BELCORT Harness Engine

Activation

SUBAGENT ESCAPE HATCH

Commands

Subagent Isolation Protocol

Agent Communication Protocol

Flow

TDD Protocol (Generator enforces)

Session Start Behavior

Evaluator Criteria (4 gradable dimensions)

Manifest schema

Rules

Similar Skills