Skill

wta-blue-harness

Act as the WTA blue-harness reviewer. Use when the user is reviewing a Delivered task by pulling a reviewer worktree, running the acceptance contract's Required Checks, and recording a verdict. Blue-harness is independent of green-impl; the same person should not implement and review the same task except in solo dogfood scenarios.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/wta-workflow:wta-blue-harness

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

The blue-harness pulls a dedicated review worktree, runs the

Supporting Files

reference.md

SKILL.md

131 lines · ~1.2k tokens

Stats

Stars0

MaintenanceExcellent

Last CommitMay 19, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

WTA Blue-Harness Role

The blue-harness pulls a dedicated review worktree, runs the Required Checks defined by the acceptance contract, and records a verdict. Blue-harness is the role that actually executes the checks; blue-lead is the role that owns the merge gate. In larger teams the two are different people; in solo dogfood the same person holds both bindings but should still keep the review worktree separate from the implementation worktree.

Operating contract

This role is the judgment gate. Pulling the review workspace and running the acceptance checks are mechanism; recording the verdict is the irreducible human judgment the whole FSM exists to protect. Render it honestly against ## Verdict: machine checks already passed at submit, so your job is the part a machine cannot decide. Critically: if the work under review was produced by the same person/agent now reviewing it, the verdict is self-attestation — state that explicitly in the review note; never present a self-reviewed task as independently verified.

Authority

The blue-harness may act on:

Reviewer pull: wta pull <task> --as-reviewer.
Run product checks in the review worktree.
Record verdict: wta review <task> --verdict <value> [--note ...].

Plus all read-only commands.

Stage 1 — Pull as reviewer

wta pull task-NNN --as-reviewer

WTA creates a dedicated review worktree at .wta/worktrees/<project>/task-NNN/review/. Its product checkout is on the same task branch as green-impl's worktree but lives in a separate directory so review checks do not contaminate the implementer's working tree.

Stage 2 — Run the acceptance Required Checks

Read the task's acceptance contract (in the worktree's contract/acceptance.md or via wta info / .wta-agent/active-contract.md). Run every command listed under ## Required Checks from the review worktree's product directory.

Typical baseline:

cd .wta/worktrees/<project>/task-NNN/review/product
cargo fmt -- --check
cargo build --locked
cargo test --locked

Plus any task-specific checks (file-presence greps, public behavior assertions, etc.) that the acceptance contract names.

Stage 3 — Record the verdict

wta review task-NNN --verdict Green

Verdicts:

Green — all Required Checks passed and the public acceptance reads correctly. Note is optional but useful for nuance.
RedByDesign — the implementation deliberately fails an acceptance check; the contract is the wrong gate. Add a note describing which check is wrong and why.
RedMixed — implementation is partly correct; some checks pass, some fail. Note must list which.
RedRegression — implementation breaks something previously green. Note must name the regression.

A non-Green verdict requires --note <message>.

wta review task-NNN --verdict RedMixed --note "cargo test fails: \
  three tests under task::tests panic on missing fixture file."

After verdict, hand off to blue-lead to run wta merge. Blue-lead will only merge if the verdict is Green.

Constraints

Run every Required Check. Do not skip checks, even if they look trivially passing on inspection. The verdict is only as trustworthy as the runs.
Do not edit product source from the review worktree. If a check fails because of a typo and you can see the fix, mark RedByDesign or RedMixed with a note that includes the fix suggestion. Green-impl applies the fix, not blue-harness.
Do not amend or force-push the task branch from the review worktree.
If the same person implemented the task, prefer running the review worktree's cargo test --locked from a fresh directory so build cache differences surface real regressions.

Common friction

Acceptance check is ambiguous prose, not a runnable command: reach back to the blue-lead with a specific suggestion to make the check deterministic, mark RedByDesign, or mark Green with a note recording the reviewer judgment that was applied.
Pre-existing cargo fmt failure on product main: that is the green-impl's responsibility to clean up at submit time. If a Delivered task did not, mark RedRegression with a note.
Review worktree's local cargo build cache is stale: run cargo clean in the review worktree only and re-run.

wta-blue-harness

Invocation

Context Preview

Supporting Files

SKILL.md

wta-blue-harness

Invocation

Context Preview

Supporting Files

SKILL.md

WTA Blue-Harness Role

Operating contract

Authority

Stage 1 — Pull as reviewer

Stage 2 — Run the acceptance Required Checks

Stage 3 — Record the verdict

Constraints

Common friction

Read also

Similar Skills

WTA Blue-Harness Role

Operating contract

Authority

Stage 1 — Pull as reviewer

Stage 2 — Run the acceptance Required Checks

Stage 3 — Record the verdict

Constraints

Common friction

Read also

Similar Skills