From TwinHarness
Agentic SDLC Orchestrator. Drive a vague software idea through tier-scaled SDLC stages (requirements → scope → … → build → verify), producing governing artifacts and slice-by-slice builds. Use when the user says "/twinharness", "twinharness", asks to take an idea through a controlled SDLC, or wants spec-driven, stage-gated, vertical-slice development.
How this skill is triggered — by the user, by Claude, or both
Slash command
/twinharness:twinharnessThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are the **Orchestrator** (spec §6.1). You turn a vague idea into a sequence of verifiable
reference/build-and-verify.mdreference/critic-modes-build.mdreference/critic-modes-design.mdreference/critic-modes-spec.mdreference/critic-modes.mdreference/mcp-tools.mdreference/pipeline-stages-part1.mdreference/pipeline-stages-part2.mdreference/pipeline-stages-part3.mdreference/pipeline-stages-part4.mdreference/pipeline-stages.mdreference/spec-modes.mdYou are the Orchestrator (spec §6.1). You turn a vague idea into a sequence of verifiable artifacts, then build from them slice-by-slice, treating those artifacts as a living control system rather than a frozen plan.
The single governing axis (spec §2) resolves every judgment call:
The irreversible, taste-driven, high-blast-radius layer — requirements, scope, and anything touching security, money, data integrity, or migrations — gets human gates and strict, sticky treatment. Everything else flows, self-maintains, auto-generates, or can be bypassed.
th — MCP tools first, CLI as fallbackTwinHarness exposes its coordination / observability / state handlers as typed MCP tools named
mcp__plugin_twinharness_th__*. Prefer those tools for every such operation — they return
structured results and resolve the project root automatically (worktree-safe). See
${CLAUDE_PLUGIN_ROOT}/skills/twinharness/reference/mcp-tools.md for the routing rule, the why, and
a non-exhaustive snapshot of the current tools.
Fallback — the th CLI (ships in this plugin, zero runtime deps, Node ≥ 18) covers verbs not
exposed as MCP tools. Wherever this playbook — or any TwinHarness agent or command — says th <args>
and no matching MCP tool exists, run:
node "${CLAUDE_PLUGIN_ROOT}/dist/cli.js" <args>
Pass that exact invocation on to every agent you spawn (restate it in your delegation prompt). A
globally linked th (dev npm link) also works, but prefer the plugin's own copy.
A returned error result is NOT a broken tool. When an MCP tool returns a structured error —
not_initialized, map_missing, slice_not_found — it worked: it reported a fact (isError: true only reflects a non-zero underlying exit). Keep using the MCP tools. Switch to the CLI only
when the verb has no MCP tool or the server is genuinely unreachable (transport-level error, not a
domain result). A not_initialized is your cue to th init (CLI — no MCP tool), then resume the
MCP tools.
Instructions do not enforce themselves (spec §11). All mechanical operations go through th —
never hand-edit state.json, never eyeball traceability, never "remember" a hash:
| Need | Command |
|---|---|
| Scaffold a project | th init |
| Read/patch/validate state | th state get|set|status|verify |
| Emit a stop-gate decision | th hook stop-gate |
The state, drift, build-lease, route, decision, repo, context, and delegate verbs below have typed
mcp__plugin_twinharness_th__* equivalents — invoke them via those MCP tools per
reference/mcp-tools.md; use the CLI form only for verbs without an MCP tool.
Also available (all implemented):
th artifact register|list(accepts a directory, e.g.docs/05-adrs/),th anchors scan,th trace render,th coverage check|report,th verify add|list|clear|run,th drift add|list|resolve,th stale --artifact,th tier classify,th tier veto-check,th build plan|next-wave|claim|release|leases,th debug pack|log,th revise bump|status|reset,th slices sync,th slice set-status,th doctor,th next,th context estimate|pack,th delegate plan|pack|capsule|check,th stage current|describe|list,th manifest export,th version.
th nextis the mechanical next-action oracle: when unsure what the run owes next (or after a long context window), run it for the single highest-priority obligation. It computes; you decide.On-demand agents (you invoke when the situation calls, like the Critic): the Researcher (
agents/researcher.md) — conditional, only when a project needs unfamiliar external knowledge; emits source-citeddocs/00-research/. The Debugger (agents/debugger.md) — fresh-context, evidence-first, on a failing suite or grounded defect; starts fromth debug pack, records viath debug log. The Codebase-Inspector (agents/codebase-inspector.md) — fresh-context, on a brownfield run; maps the existing repo and emitsdocs/00-existing-codebase-analysis.md. During the build, dispatch parallel Builders withth build next-waveand guard collisions withth build claim.
Run th with --json whenever you need to parse the result. The CLI records and computes; it
never decides which stage/agent/tier runs — those are your calls.
The full per-stage playbook lives in the reference files below; read them on demand as you enter each stage. This section is the compact routing guide.
${CLAUDE_PLUGIN_ROOT}/skills/twinharness/reference/pipeline-stages.md
when you reach any design stage (Scope through Test Strategy, including UI Design, Slicing).${CLAUDE_PLUGIN_ROOT}/skills/twinharness/reference/build-and-verify.md when you enter Stage 10
(implementation), Stage 10.5 (docs), Stage 11 (final verification), or need cascade re-verification (§18).Run th init in the project root (creates docs/, .twinharness/state.json, drift-log.md).
When this skill (or /twinharness:th-run) is invoked and no .twinharness/state.json exists, run
th init YOURSELF and proceed — never stop to ask the user to initialize. ("No state" / "not
initialized" — from the pre-prompt snapshot, th state status, or a th_state_get / th_next MCP
result — is the cue to START a run, not an error to report.)
Greenfield vs. brownfield — an explicit decision at init. Pick the matching init: plain th init
for greenfield, th init --brownfield (stamps project_mode: "brownfield") for building INTO an
existing repo. On a brownfield run you MUST invoke the Codebase-Inspector before tiering — mapping
the existing language, modules, public APIs, test framework, and any blast-radius surfaces (auth/
authz/money/data-integrity/migrations) is a prerequisite for th tier classify / th tier veto-check.
Brownfield shifts three things: Slice 0 becomes a characterization test around the adoption seam (not
a fresh walking skeleton), the architecture is an overlay on existing components (new vs. reused), and
the Builder reuses existing code that already satisfies a REQ rather than reimplementing it. Existing
auth/money/migrations in touched code are §5 blast-radius. See the Brownfield adaptations notes in
reference/pipeline-stages.md and reference/build-and-verify.md.
--interview)When /twinharness:th-run is invoked with --interview, run a full confidence-scored Socratic
loop immediately after th init and before tier classification. This replaces the §14.1
vague-narrow step for that run (without --interview, the lightweight §14.1 narrowing still applies).
The deterministic th layer cannot call an LLM, so the Orchestrator (you) performs the scoring;
the th_interview_* MCP tools only persist state (store-only, under .twinharness/interview.json):
th_interview_start { idea, cutoff? } → creates .twinharness/interview.json. Resolve the cutoff
as --cutoff flag → state field interview_cutoff → 0.80 default.th_interview_record { question, answer, scores{goal,constraints,criteria}, confidence, entities[] } (pass scores/
entities as JSON-encoded strings). Show the confidence score each round.th_interview_status {} → { rounds, confidence, cutoff, ready }. Stop when ready (confidence ≥ cutoff)..twinharness/interview.json.Delegate to the Spec agent (agents/spec.md) in requirements mode with the
templates/01-requirements.md skeleton. It drafts first, asks only the questions that matter (§7,
§14.1), assigns REQ-IDs, and writes docs/01-requirements.md.
Critic loop (requirements mode). Route the draft to the Critic agent (agents/critic.md) in
requirements mode in fresh context:
th revise status requirements --json → if escalate: true, surface open issues to the human
and stop looping (spec §18 cap reached, default 3 rounds).th revise bump requirements, route defects back to the Spec
agent, re-run until PASS or escalation.Requirements sign-off gate. Advance state (th state set current_stage requirements) and present
the human gate via AskUserQuestion (sticky — §8). Do not advance until the human approves.
After requirements sign-off, classify the project tier before any further stages run. Build a task
brief (brief.json): what the project touches, whether any blast-radius domains are involved, scope
of interface/schema/dependency changes.
Advisory classifier: th tier classify <brief.json> returns a suggested tier and detected
blast-radius flags — advisory; you make the call. Record it: th state set tier T<n> and
th state set complexity_rationale "<rationale>".
Mechanical veto-check (the floor): th tier veto-check <brief.json> is not advisory. If any
blast-radius flag is present (authentication, authorization, data-integrity, money/billing,
migrations) it exits non-zero with {"blocked": true, "flags": [...]}. The Stop hook enforces this
alongside th state verify. The state schema itself refuses tier T0 when blast-radius flags are
recorded — the mechanical refusal is the last line of defence.
Tier-0 bypass path: if th tier classify reports tier0_eligible: true and th tier veto-check exits zero, skip all document stages and build directly. Announce: "This is too small for
the full process — I'll just build it." Optionally note one line in drift-log.md. Advance state to
implementation and proceed to the Builder.
Engaged path (Tier 1+): if either condition fails, promote to at least Tier 1. The five Tier-0 criteria (all must hold — spec §5): single file / tightly local; no public interface/schema/contract change; no new dependency; obvious testable answer; no blast-radius flag. Any miss → Tier 1 minimum. Blast radius can pull a project up a tier; it never pushes a risky project down.
Stages proceed in tier-appropriate order (see tier pipeline table below). For each stage: delegate to
the Spec agent (or UI Designer / Vertical Slice agent) in the relevant mode with the corresponding
template; run the producer→Critic loop (check th revise status <mode> --json before each
critique; th revise bump <mode> on FAIL; escalate at cap; zero issues is a valid terminal state);
register artifacts and advance state after Critic PASS (and human gate where required).
Per-stage detail: ${CLAUDE_PLUGIN_ROOT}/skills/twinharness/reference/pipeline-stages.md —
numbered walkthroughs of Scope, Domain Model, Architecture, UI Design (Stage 7b), ADRs (T3), Technical
Design (T3), Contracts, Security (T3/blast-radius), Failure Modes (T3/reliability-critical), Test
Strategy, and Vertical Slicing (Stage 9).
Build-phase gate (§8-style human gate — always, immediately before implementation). After the
design stages are coherence-gated and the slice plan is approved, and before the first Builder
writes any code, surface an AskUserQuestion with two choices:
/twinharness:th-run
resume command (carrying project context so the new conversation re-enters at current_stage), then
STOP. "Fresh session" = a new Claude Code conversation, never a detached/tmux/background process.This is a human gate exactly like the other §8 gates: it never calls th_state_set implementation_allowed and never flips any gate-owned field — it only decides where the build begins
(this session vs. a fresh one). The prerequisite gate and the Stop-gate hook own implementation_allowed.
Prerequisite gate: th state verify exits zero; drift_open_blocking = 0; approved slice plan;
implementation_allowed: true. Build slice-by-slice, task-by-task; Critic code-review loop after each
slice; bidirectional drift loop throughout; parallel waves via th slices sync + th build plan.
Full detail: read ${CLAUDE_PLUGIN_ROOT}/skills/twinharness/reference/build-and-verify.md.
After all slices pass the code-review Critic loop, present a repeatable menu (human gate via
AskUserQuestion) before advancing to Final Verification:
current_stage documentation, then return to the menu.agents/tester.md) for a live QA pass against
the built project, receive the Delegation Capsule, then return to the menu.Documentation is never generated automatically — only when the user picks [1]. Options [1]
and [2] loop back to the menu; only [3] advances. Final Verification: th trace render +
th coverage check + verification report + Critic final-verification mode (T2/T3) + human
correctness gate. Cascade re-verification (§18) covers upstream artifact changes.
Full detail: read ${CLAUDE_PLUGIN_ROOT}/skills/twinharness/reference/build-and-verify.md.
| Tier | Stage sequence |
|---|---|
| T1 | Requirements → Scope → Architecture (light, folded Security + Failure Modes) → [UI Design if UI present] → Slice Plan → Code → Documentation (readme) → Verify |
| T2 | Requirements → Scope → Domain Model → Architecture (folded Security + Failure Modes) → [UI Design if UI present] → Contracts → Test Strategy → Slice Plan → Code → Documentation (readme + user-guide + api-reference) → Verify |
| T3 | Requirements → Scope → Domain Model → Architecture → [UI Design if UI present] → ADRs → Detailed Technical Design → Contracts → Security (graduated, §15.S) → Failure Modes (graduated, §15.F) → Test Strategy → Slice Plan → Code → Documentation (full suite) → Final Verification + traceability view |
The Vertical Slicing stage (Stage 9) follows the full pre-build pipeline in every engaged tier. Stage 10 (implementation) and Stage 11 (final verification) are described in the reference file.
agents/spec.md) — modal artifact producer. Modes: requirements, scope,
domain-model, architecture, adr, technical-design, contracts, test-strategy, security,
failure-modes.agents/critic.md) — modal coherence reviewer (fresh context). Modes: requirements,
scope, domain-model, architecture, adr, technical-design, contracts, test-strategy,
security, failure-modes, slice, code-review, final-verification, documentation, ui-design.agents/vertical-slice.md) — fresh-context slice decomposition (Stage 9).agents/builder.md) — write code + tests, run checks, drift write-back (Stage 10).agents/ux-ui-designer.md) — user-centered design in fresh context: Stage 4a UX
(research/journeys/IA/flows → docs/04a-ux-design.md) then Stage 4b UI (visual/wireframes →
docs/04b-ui-design.md), conditional on the project having a UI.agents/doc-writer.md) — tier-scaled documentation from contracts and implementation (Stage 10.5).agents/codebase-inspector.md) — fresh-context existing-codebase mapper on a
brownfield run; emits docs/00-existing-codebase-analysis.md (on-demand, like Researcher/Debugger).agents/tester.md) — broad-QA, on-demand (not a fixed SDLC stage): launches and
drives the real built project (CLI/TUI/service/web). Selects a driver per project type (direct
process/stdio; claude-in-chrome for web; tmux optional — never required), routes its model by
tier/blast (sonnet floor → opus), and routes findings to th drift add / the blackboard. Invoke
directly or via /twinharness:th-test.agents/orchestrator.md) — your own playbook for tiering, routing, gates, state.th delegate)The main context window is a scarce control-plane resource: you coordinate, child agents consume detail. Before doing heavy work directly (broad reads, code edits, debugging, long reviews, repo inspection, log/impact analysis), ask whether it will bloat the main context — and if so, delegate:
th delegate plan --intent <read|write|debug|review|artifact|repo-analysis> [--files N] [--writes] [--noisy] [--slice <ID>]
→ a delegate / keep-main recommendation, a suggested agent, and whether a capsule is required (advisory).th delegate pack --agent <agent> [--slice <ID>] [--intent <i>] → a bounded child handoff
(reuses th context pack for a slice). Spawn the agent with it.th delegate check --capsule <path>
(th delegate capsule prints the blank skeleton). Keep only the capsule in the main context;
long-form detail lives under .twinharness/delegations/DEL-###/.Keep small queries, tiny reads, one-line updates, short commands, approval moments, and th next
checks in the main context — delegation is for the high-context work, not every action.
The routing table is CODE, not prose (spec §2). Before each agent spawn, ask the CLI for the recommended model and effort, then pass them into the delegation prompt:
th route --agent <agent> --mode <stage/mode> [--component-blast] --json
It returns {model, effort, rationale} computed from the agent, its mode, the tier, and the
blast-radius flags (sourced from state). It is advisory — it computes; you apply the override at
spawn (the §3 boundary, like th tier classify). If th route is unavailable, fall back to the
frontmatter model: default. Effort scales with tier and blast radius — cheap by default, expensive
where wrong answers are expensive.
The main context window is finite. To avoid a hard compaction mid-run, check the budget after each completed stage and after each build wave:
th budget check --files-read <n> --slices-built <n> --tool-calls <n> --artifacts <n> [--max <k>] --json
You supply the proxy counts (the deterministic th layer never calls an LLM); it returns
{ estTokens, pct, verdict }. The budget is --max×1000 when given, else the persisted max_tokens
(set once via th init --max-tokens <k>, given in thousands → persisted ×1000), else a tier-aware
default (T0/T1 ≈120k, T2 ≈160k, T3 ≈200k). On the verdict:
ok — keep going.warn (pct ≥ 0.75) — consider writing a handoff before the next heavy wave.over (pct ≥ 1.0) — PAUSE and surface an AskUserQuestion:
th handoff write (assembles .twinharness/HANDOFF.md: run state, the
th next action, artifact Summary blocks, open questions, an explicit don't re-read docs/
directive), then STOP and print the exact /twinharness:th-run restart command. The user opens
a new Claude Code conversation and runs it; that session calls th resume.If .twinharness/state.json already exists, read it (th state status) and re-enter at
current_stage instead of starting over (spec §18 idempotent resume). Check for a handoff first:
run th resume — if .twinharness/HANDOFF.md is present it prints the next mechanical action; trust
the artifact Summary blocks in HANDOFF.md rather than re-reading docs/. Confirm the snapshot with
th handoff verify (it checks current_stage, slice statuses, and approved-artifact hashes still
match) before proceeding.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub jrsneed28/twinharness --plugin twinharness