TwinHarness

Turns "build me X" into working, tested software by forcing the idea through requirements, scope, design, and slice-by-slice implementation with verification gates — as a Claude Code plugin.

Early development notice. TwinHarness is at v0.7.0. The pipeline has been exercised end-to-end and ships 1672 tests, green on CI (1 platform-conditional skip in tests/concurrency.test.ts — POSIX-only permission-error case, intentionally skipped on Windows/root and covered on Linux/macOS CI), but it has limited real-world mileage and interfaces may change before 1.0. Expect breaking changes. Use it, push its limits, file issues — just don't bet a production release on it yet.

What it is

TwinHarness is a Claude Code plugin: an agentic SDLC orchestrator that takes a vague software idea and produces working, tested software through a disciplined pipeline. It coordinates 16 specialized agents — a core pipeline of Orchestrator, Spec, Critic, Vertical-Slice, Builder, Test-Author, UX/UI-Designer (two ordered stages: 4a UX, 4b UI), Doc-Writer, Merge-Coordinator, Reconciler, Red-Team, and Librarian, plus on-demand Researcher, Debugger, Codebase-Inspector, and Tester — backed by a deterministic TypeScript CLI (th) that handles every mechanical operation: state, content hashing, REQ-ID traceability, coverage gates, the drift log, and a Stop hook that blocks Claude from claiming "done" while state is invalid or a blocking discovery is open.

Three things make it different from asking an agent to build something directly:

Artifacts govern; they don't decorate. Every stage produces a document that downstream stages are mechanically checked against. When reality diverges during the build, the document updates — in both directions.
The process scales with risk, not ceremony. A trivial change bypasses everything (Tier 0). A project touching auth, money, or migrations gets the strictest treatment, and that floor is enforced by code, not by promises.
Mechanical truths are code. State, hashing, coverage, drift counts, and the completion gate live in a tested CLI — not in prompt text a model could misremember.

Who it's for: Claude Code users who want spec-driven, gated development instead of one-shot vibe-coding; people burned by agents that build the wrong thing or claim "done" when they aren't; teams that need traceability from requirements to code.

What a run looks like

Start with:

/twinharness:th-run build a CLI tool that tracks my reading list

Then, roughly:

Scaffolding. The Orchestrator initializes docs/, .twinharness/state.json, and drift-log.md in your project directory.
Requirements. A Spec agent drafts requirements, assigns REQ-IDs, and asks you only the questions that matter. A fresh-context Critic reviews the draft.
Your first gate. You see the requirements and are asked to approve or request changes. Once you sign off, requirements are sticky — only you can reopen them.
Tier classification. The Orchestrator sizes the project (Tier 0–3). Trivial → Tier 0 bypass. Risky blast-radius work → Tier 3 with more gates and more expensive models.
Design stages stream. Domain model, architecture, contracts, security/failure analysis, and test strategy run with Critic reviews but without interrupting you — except for genuinely irreversible choices (e.g. monolith vs. services) and blast-radius decisions (e.g. the auth scheme). If your project has a UI, the UX/UI-Designer runs two ordered stages — Stage 4a (UX: research, journeys, information architecture) then Stage 4b (UI: visual direction, screens, tokens) — and presents 2–3 directions at each, asking you to pick one before it details that stage.
Vertical slices, then build. A fresh-context agent decomposes the design into thin end-to-end slices. Builders implement them one-by-one (in conflict-free parallel waves when slices are independent), tests included, with a Critic after each.
Documentation. A Doc-Writer agent generates tier-appropriate docs. Critic-reviewed; no human gate.
Verification. A final report separates what the Critic can certify (coherence) from what only tests and you can certify (correctness). You sign off.

Architecture

flowchart TD
    Idea([User idea]) --> Orch[Orchestrator skill]

    Orch --> Tier{Tier classify}
    Tier -- T0 bypass --> Build
    Tier -- T1-T3 --> Spec

    Spec[Spec agent] --> CriticSpec[Critic — fresh context]
    CriticSpec -- FAIL --> Spec
    CriticSpec -- PASS --> HumanGate{Human gate<br/>requirements/scope}
    HumanGate --> DesignStages

    subgraph DesignStages[Design stages — stream with Critic reviews]
        direction LR
        D1[Domain model] --> D2[Architecture]
        D2 --> D3a[UX design 4a<br/>conditional · gated]
        D3a --> D3b[UI design 4b<br/>conditional · gated]
        D3b --> D4[Contracts / security / test strategy]
    end

TwinHarness

Turns "build me X" into working, tested software by forcing the idea through requirements, scope, design, and slice-by-slice implementation with verification gates — as a Claude Code plugin.

Early development notice. TwinHarness is at v0.7.0. The pipeline has been exercised end-to-end and ships 1672 tests, green on CI (1 platform-conditional skip in tests/concurrency.test.ts — POSIX-only permission-error case, intentionally skipped on Windows/root and covered on Linux/macOS CI), but it has limited real-world mileage and interfaces may change before 1.0. Expect breaking changes. Use it, push its limits, file issues — just don't bet a production release on it yet.

What it is

Three things make it different from asking an agent to build something directly:

Artifacts govern; they don't decorate. Every stage produces a document that downstream stages are mechanically checked against. When reality diverges during the build, the document updates — in both directions.
The process scales with risk, not ceremony. A trivial change bypasses everything (Tier 0). A project touching auth, money, or migrations gets the strictest treatment, and that floor is enforced by code, not by promises.
Mechanical truths are code. State, hashing, coverage, drift counts, and the completion gate live in a tested CLI — not in prompt text a model could misremember.

What a run looks like

Start with:

/twinharness:th-run build a CLI tool that tracks my reading list

Then, roughly:

Scaffolding. The Orchestrator initializes docs/, .twinharness/state.json, and drift-log.md in your project directory.
Requirements. A Spec agent drafts requirements, assigns REQ-IDs, and asks you only the questions that matter. A fresh-context Critic reviews the draft.
Your first gate. You see the requirements and are asked to approve or request changes. Once you sign off, requirements are sticky — only you can reopen them.
Tier classification. The Orchestrator sizes the project (Tier 0–3). Trivial → Tier 0 bypass. Risky blast-radius work → Tier 3 with more gates and more expensive models.
Design stages stream. Domain model, architecture, contracts, security/failure analysis, and test strategy run with Critic reviews but without interrupting you — except for genuinely irreversible choices (e.g. monolith vs. services) and blast-radius decisions (e.g. the auth scheme). If your project has a UI, the UX/UI-Designer runs two ordered stages — Stage 4a (UX: research, journeys, information architecture) then Stage 4b (UI: visual direction, screens, tokens) — and presents 2–3 directions at each, asking you to pick one before it details that stage.
Vertical slices, then build. A fresh-context agent decomposes the design into thin end-to-end slices. Builders implement them one-by-one (in conflict-free parallel waves when slices are independent), tests included, with a Critic after each.
Documentation. A Doc-Writer agent generates tier-appropriate docs. Critic-reviewed; no human gate.
Verification. A final report separates what the Critic can certify (coherence) from what only tests and you can certify (correctness). You sign off.

Architecture

flowchart TD
    Idea([User idea]) --> Orch[Orchestrator skill]

    Orch --> Tier{Tier classify}
    Tier -- T0 bypass --> Build
    Tier -- T1-T3 --> Spec

    Spec[Spec agent] --> CriticSpec[Critic — fresh context]
    CriticSpec -- FAIL --> Spec
    CriticSpec -- PASS --> HumanGate{Human gate<br/>requirements/scope}
    HumanGate --> DesignStages

    subgraph DesignStages[Design stages — stream with Critic reviews]
        direction LR
        D1[Domain model] --> D2[Architecture]
        D2 --> D3a[UX design 4a<br/>conditional · gated]
        D3a --> D3b[UI design 4b<br/>conditional · gated]
        D3b --> D4[Contracts / security / test strategy]
    end

TwinHarness

Popularity

What's Inside

Confidence

README

TwinHarness

What it is

What a run looks like

Architecture

Similar Plugins

context7-plugin

octo

startup-business-analyst

claude-buddy

TwinHarness

What it is

What a run looks like

Architecture

Popularity

Health & Quality

Similar Plugins

context7-plugin

octo

startup-business-analyst

claude-buddy

creative-writing

dotnet-skills