From harness-claude
Analyzes specs and plans for soundness before sign-off. Auto-fixes inferrable issues and surfaces design decisions for review.
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness-claude:harness-soundness-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Deep soundness analysis of specs and plans. Auto-fixes inferrable issues, surfaces design decisions to you. Runs automatically before sign-off.
Deep soundness analysis of specs and plans. Auto-fixes inferrable issues, surfaces design decisions to you. Runs automatically before sign-off.
--mode spec)--mode plan)--mode spec — Run spec-mode checks (S1-S7). Invoked by harness-brainstorming.--mode plan — Run plan-mode checks (P1-P7). Invoked by harness-planning.No spec or plan may be signed off without a converged soundness review. Inferrable fixes are applied silently. Design decisions are always surfaced to the user.
Every finding conforms to this structure:
{
"id": "string — unique identifier",
"check": "string — e.g. S1, P3",
"title": "string — one-line summary",
"detail": "string — explanation with evidence",
"severity": "error | warning — errors block sign-off",
"autoFixable": "boolean — whether fixable without user input",
"suggestedFix": "string | undefined — what the fix would do",
"evidence": ["string[] — references to spec/plan sections and codebase files"]
}
Execute all checks for the active mode. Classify each finding as autoFixable: true or false. Record total issue count.
Run check_traceability to verify that all requirements in the spec/plan have corresponding implementation artifacts. Run validate_cross_check to verify plan-to-implementation alignment as part of the soundness assessment.
Before running checks, determine graph availability:
.harness/graph/ exists.query_graph — traverse module/dependency nodes to verify referenced patterns and architectural compatibilityfind_context_for — search for related design decisions from other specsget_relationships — verify dependency direction and layer complianceget_impact — analyze downstream impact to verify dependency completenessPer-check procedures include "Without graph" and "With graph" variants. Use whichever matches step 1.
--mode spec)| # | Check | What it detects | Auto-fixable? |
|---|---|---|---|
| S1 | Internal coherence | Contradictions between decisions, technical design, and success criteria | No — surface to user |
| S2 | Goal-criteria traceability | Goals without success criteria; orphan criteria not tied to any goal | Yes — add missing links, flag orphans |
| S3 | Unstated assumptions | Implicit assumptions not called out (e.g., single-tenant, always-online) | Partially — infer obvious ones, surface ambiguous |
| S4 | Requirement completeness | Missing error/edge cases, failure modes; EARS unwanted-behavior gaps | Partially — add obvious error cases, surface design-dependent |
| S5 | Feasibility red flags | Design depends on nonexistent codebase capabilities or incompatible patterns | No — surface with evidence |
| S6 | YAGNI re-scan | Speculative features that crept in during conversation | No — surface to user |
| S7 | Testability | Vague success criteria not observable or measurable ("should be fast") | Yes — add thresholds where inferrable |
Analyze: Decisions table, Technical Design, Success Criteria, Non-goals.
Detection:
Classification: Always severity: "error", autoFixable: false. Contradictions require user judgment.
Example:
{
"id": "S1-001",
"check": "S1",
"title": "Decision contradicts Technical Design",
"detail": "D3 says 'use SQLite' but Technical Design > Data Layer describes PostgreSQL with migrations.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Align Technical Design with decision (SQLite) or update decision to PostgreSQL.",
"evidence": ["Decisions D3: 'Use SQLite'", "Technical Design > Data Layer: 'PostgreSQL schema'"]
}
Analyze: Overview (goals), Success Criteria.
Detection:
Classification:
severity: "warning", autoFixable: true. Fix: add criterion derived from Technical Design.severity: "warning", autoFixable: false. Removing criteria is a design decision.Example:
{
"id": "S2-001",
"check": "S2",
"title": "Goal has no success criterion",
"detail": "Goal 'Support offline mode' has no corresponding criterion.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add: 'App functions without network for all read operations, returning cached data.'",
"evidence": ["Overview: 'Support offline mode'", "Success Criteria: no match"]
}
Analyze: Technical Design, Decisions table, data structures, integration points.
Detection:
query_graph for related modules' assumptions. Use find_context_for to surface conflicting design decisions.Classification:
severity: "warning", autoFixable: true. Fix: add to Assumptions section.severity: "warning", autoFixable: false. User decides.Example:
{
"id": "S3-001",
"check": "S3",
"title": "Implicit Node.js runtime assumption",
"detail": "Technical Design references 'path.join' and 'fs.readFileSync' without declaring Node.js runtime.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add to Assumptions: 'Runtime: Node.js >= 18.x (LTS).'",
"evidence": [
"Technical Design > File Operations: path.join, fs.readFileSync",
"No Assumptions section"
]
}
{
"id": "S3-002",
"check": "S3",
"title": "Ambiguous concurrency model",
"detail": "Technical Design describes a background job processor but does not specify in-process, worker thread, or separate process. Affects error isolation and deployment.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Add decision specifying concurrency model: in-process event loop, worker_threads, or separate process.",
"evidence": [
"Technical Design > Job Processor: 'processes background jobs'",
"Decisions table: no concurrency entry"
]
}
Analyze: Technical Design (data structures, API endpoints, integration points), Success Criteria.
Detection:
Classification:
severity: "warning", autoFixable: true. Fix follows codebase patterns.severity: "warning", autoFixable: false.Example:
{
"id": "S4-001",
"check": "S4",
"title": "Missing file-not-found error case",
"detail": "Config read with fs.readFileSync has no ENOENT handling. Codebase convention (packages/core/src/config.ts) returns defaults.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add: 'If config file missing (ENOENT), return default config. Log debug message.'",
"evidence": [
"Technical Design: 'read config from harness.config.json'",
"Codebase: config.ts returns defaults on ENOENT"
]
}
{
"id": "S4-002",
"check": "S4",
"title": "Undefined retry strategy for external service",
"detail": "Technical Design calls an external API for license validation but specifies no timeout, unavailability, or error behavior. Design decision affects UX (block vs degrade).",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Add decision: 'When license API unavailable: (a) fail open with warning, (b) fail closed, or (c) cache last result for N hours.'",
"evidence": [
"Technical Design > License Check: 'call /api/validate on startup'",
"No fallback behavior specified"
]
}
Analyze: Technical Design (referenced modules, dependencies, patterns, APIs).
Detection:
query_graph to verify modules exist and check dependencies. Use get_relationships for architectural compatibility. Use get_impact for cascading effects not in spec.Classification: Always severity: "error", autoFixable: false. Feasibility problems require design revision.
Example:
{
"id": "S5-001",
"check": "S5",
"title": "Referenced function has different signature",
"detail": "Spec says 'validateDependencies(projectPath)' but actual signature is 'validateDependencies(config: ProjectConfig): ValidationResult'.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Update Technical Design to use actual signature with ProjectConfig parameter.",
"evidence": [
"Technical Design: 'call validateDependencies(projectPath)'",
"packages/core/src/validator.ts:42: actual signature"
]
}
Analyze: Technical Design, Decisions table, Implementation Order.
Detection:
Classification: Always severity: "warning", autoFixable: false. Removing features is a design decision.
Example:
{
"id": "S6-001",
"check": "S6",
"title": "Speculative configuration option",
"detail": "'pluginDir' config option defined but no goal/criterion mentions plugins.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Remove pluginDir and plugin loading from Technical Design.",
"evidence": ["Technical Design: 'pluginDir: string'", "Overview/Criteria: no plugin mention"]
}
Analyze: Success Criteria.
Detection:
Classification:
severity: "warning", autoFixable: true. Fix: replace vague qualifier with specific threshold.severity: "error", autoFixable: false. User must rewrite.Example:
{
"id": "S7-001",
"check": "S7",
"title": "Vague performance criterion",
"detail": "Criterion #3 says 'build should be fast'. Technical Design mentions 30-second CI timeout.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Replace with 'build completes in under 30 seconds on CI'.",
"evidence": ["Criteria #3: 'build should be fast'", "Technical Design > CI: '30-second timeout'"]
}
--mode plan)| # | Check | What it detects | Auto-fixable? |
|---|---|---|---|
| P1 | Spec-plan coverage | Success criteria with no corresponding task(s) | Yes — add missing tasks |
| P2 | Task completeness | Tasks missing inputs, outputs, or verification | Yes — infer and fill in |
| P3 | Dependency correctness | Cycles in dependency graph; undeclared dependencies | Yes — add missing edges |
| P4 | Ordering sanity | Same-file tasks in parallel; consumers before producers | Yes — reorder |
| P5 | Risk coverage | Spec risks without mitigation in plan | Partially — add obvious, surface others |
| P6 | Scope drift | Plan tasks not traceable to any spec requirement | No — surface to user |
| P7 | Task-level feasibility | Undecided dependencies; tasks too vague to execute | No — surface to user |
Analyze: Spec's Success Criteria and plan's Tasks. Requires both documents.
Detection:
Classification: Always severity: "error", autoFixable: true. Fix: add task covering the criterion.
Example:
{
"id": "P1-001",
"check": "P1",
"title": "Spec criterion not covered by any plan task",
"detail": "Criterion #4 ('structured error responses with request-id') has no plan task.",
"severity": "error",
"autoFixable": true,
"suggestedFix": "Add task implementing structured error responses with request-id headers.",
"evidence": ["Spec Criteria #4", "Plan Tasks 1-8: no task references error format"]
}
Analyze: Each task in the Tasks section.
Detection: Verify each task has: (a) clear inputs, (b) clear outputs, (c) verification criterion. Flag tasks missing any element.
Classification: Always severity: "warning", autoFixable: true. Fix: infer the missing element from context (e.g., if a task says "create src/foo.ts" but has no verification, add "Run: npx vitest run src/foo.test.ts" if a test file exists, or "Run: tsc --noEmit" as minimal verification).
Example:
{
"id": "P2-001",
"check": "P2",
"title": "Task missing verification criterion",
"detail": "Task 3 has inputs and outputs but no verification step.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add: 'Run: npx vitest run src/services/notification-service.test.ts'",
"evidence": ["Task 3: no 'Run:' or 'Verify:' step", "Task 4 creates the test file"]
}
Analyze: "Depends on" declarations across all tasks, file paths/artifacts each task references.
Detection:
get_impact on output files to verify downstream consumers are declared as dependents.Classification:
severity: "error", autoFixable: false. Requires task restructuring.severity: "warning", autoFixable: true. Fix: add "Depends on" declaration.Example:
{
"id": "P3-002",
"check": "P3",
"title": "Missing dependency edge",
"detail": "Task 5 imports src/types/notification.ts (created by Task 1) but does not declare dependency.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add 'Depends on: Task 1' to Task 5.",
"evidence": [
"Task 5: imports notification.ts",
"File Map: created by Task 1",
"Task 5 Depends on: Task 4 only"
]
}
{
"id": "P3-001",
"check": "P3",
"title": "Dependency cycle detected",
"detail": "Tasks form a cycle: Task 3 -> Task 5 -> Task 3. Topological sort fails.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Break cycle by merging Tasks 3 and 5, or extract shared dependency into a new task.",
"evidence": [
"Task 3: 'Depends on: Task 5'",
"Task 5: 'Depends on: Task 3'",
"Topological sort failed"
]
}
Analyze: Task execution order, file paths each task touches, parallel opportunities.
Detection:
Classification: Always severity: "warning", autoFixable: true. Fix: reorder tasks or add dependency edges.
Example:
{
"id": "P4-001",
"check": "P4",
"title": "Consumer scheduled before producer",
"detail": "Task 2 imports from src/types/user.ts created by Task 4, with no dependency declared.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add 'Depends on: Task 4' to Task 2, or reorder type definition before Task 2.",
"evidence": ["Task 2: imports user.ts", "Task 4: creates user.ts", "Task 2 Depends on: none"]
}
Analyze: Spec's risk-related content and plan's tasks/checkpoints.
Detection: Identify risks in: explicit "Risks" sections, decision rationale mentioning tradeoffs, success criteria implying failure modes, non-goals with adjacent risk. For each, check plan for: (a) mitigation task, (b) acknowledging checkpoint, or (c) explicit "accepted risk" note. Flag uncovered risks.
Classification:
severity: "warning", autoFixable: true. Fix: add mitigation task.severity: "warning", autoFixable: false. Surface with options.Example:
{
"id": "P5-001",
"check": "P5",
"title": "Spec risk has no mitigation in plan",
"detail": "Risk 'convergence loop may not terminate' has no plan task testing termination.",
"severity": "warning",
"autoFixable": true,
"suggestedFix": "Add task testing convergence termination with fixed-point inputs.",
"evidence": ["Spec Risks: 'loop may not terminate'", "Plan Tasks 1-8: no termination test"]
}
{
"id": "P5-002",
"check": "P5",
"title": "Risk requires design judgment to mitigate",
"detail": "Spec notes 'auto-fix may introduce new issues'. Mitigation depends on design choice: (a) rollback mechanism, (b) single-pass limit, or (c) human approval for cascading fixes.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Choose strategy: (a) rollback — add undo capability, (b) single-pass — simpler but less thorough, (c) human gate — safer but slower.",
"evidence": [
"Spec Risks: 'Auto-fixes may introduce new issues'",
"Decisions: no mitigation strategy"
]
}
Analyze: Plan tasks vs spec goals, success criteria, and technical design.
Detection: For each plan task, check traceability: (a) directly implements a criterion, (b) necessary prerequisite, or (c) infrastructure called for in spec. Flag untraceable tasks.
Classification: Always severity: "warning", autoFixable: false. User confirms whether each flagged task is in scope.
Example:
{
"id": "P6-001",
"check": "P6",
"title": "Plan task not traceable to spec requirement",
"detail": "Task 8 ('Add Redis caching layer') not traceable to any spec goal or criterion.",
"severity": "warning",
"autoFixable": false,
"suggestedFix": "Remove Task 8, or add corresponding goal/criterion to spec.",
"evidence": ["Task 8: 'Redis caching'", "Spec: no mention of caching"]
}
Analyze: Each task's description, file paths, code snippets, referenced decisions.
Detection:
Classification: Always severity: "error", autoFixable: false. Requires planner revision.
Example:
{
"id": "P7-001",
"check": "P7",
"title": "Task depends on undecided design choice",
"detail": "Task 7 says 'implement caching layer' but Decisions table has no caching strategy entry.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Make caching decision in spec (e.g., 'D5: LRU with 5-min TTL'), then update Task 7.",
"evidence": ["Task 7: 'Implement caching layer'", "Decisions: no caching entry"]
}
{
"id": "P7-002",
"check": "P7",
"title": "Task too vague to execute in one context window",
"detail": "Task 4 says 'implement the notification service' without specifying methods, signatures, or error handling. Cannot complete without making design decisions.",
"severity": "error",
"autoFixable": false,
"suggestedFix": "Split into sub-tasks: (a) NotificationService.create() with signature/errors, (b) NotificationService.list() with filtering, (c) NotificationService.markRead() with idempotency.",
"evidence": [
"Task 4: 'Implement the notification service'",
"No signatures, no error spec",
"Iron law: every task completable in one context window"
]
}
For every finding where autoFixable: true:
For autoFixable: false: skip. They surface in Phase 4.
| Check | Auto-fixable findings | Fix behavior |
|---|---|---|
| S1 | None | Always surfaced |
| S2 | Missing traceability links | Silent fix |
| S2 | Orphan criteria | Surfaced — design decision |
| S3 | Obvious assumptions (runtime, encoding) | Silent fix |
| S3 | Ambiguous assumptions (concurrency, tenancy) | Surfaced — user chooses |
| S4 | Obvious error cases (file I/O, JSON, network) | Silent fix |
| S4 | Design-dependent error handling | Surfaced — user chooses strategy |
| S5 | None | Always surfaced |
| S6 | None | Always surfaced |
| S7 | Vague criteria with inferrable thresholds | Silent fix |
| S7 | Unmeasurable criteria | Surfaced — user rewrites |
| P1 | Missing task for uncovered criterion | Silent fix |
| P2 | Missing inputs, outputs, or verification | Silent fix |
| P3 | Missing dependency edges | Silent fix |
| P3 | Dependency cycles | Surfaced — design decision |
| P4 | File conflicts or consumer-before-producer | Silent fix |
| P5 | Obvious risk mitigation | Silent fix |
| P5 | Judgment-dependent mitigation | Surfaced — user chooses |
| P6 | None | Always surfaced |
| P7 | None | Always surfaced |
Rule: A fix is silent when the correct resolution requires no design judgment. If two or more plausible resolutions exist, surface it.
When: A goal has no corresponding success criterion.
Fix log example:
[S2-001] FIXED: Added criterion #11 for 'Support offline mode':
'App functions without network for all read operations, returning cached data.'
Derived from: Technical Design > Offline Cache.
When: Criterion uses vague qualifiers and Technical Design provides a threshold.
Fix log example:
[S7-001] FIXED: Replaced criterion #3 'build should be fast' with:
'Build completes in under 30 seconds on CI (per Technical Design > CI Config).'
When: Technical Design implies assumptions not documented in spec.
fs.readFileSync implies Node.js).Fix log example:
[S3-001] FIXED: Added assumption: 'Runtime: Node.js >= 18.x (LTS).'
Evidence: Technical Design references path.join, fs.readFileSync.
When: An operation has no error behavior and codebase has established pattern.
Fix log example:
[S4-001] FIXED: Added ENOENT error case for config read:
'If config missing, return defaults. Log debug message.'
Following: packages/core/src/config.ts pattern.
When: A spec criterion has no corresponding plan task.
Fix log example:
[P1-001] FIXED: Added Task 9 for criterion #5 (error logging):
'Create src/utils/error-logger.ts. Verify: npx vitest run error-logger.test.ts'
When: Task missing inputs, outputs, or verification.
Fix log example:
[P2-001] FIXED: Added verification to Task 3:
'Run: npx vitest run src/services/notification-service.test.ts'
When: Task B uses artifact from Task A without declaring dependency.
Fix log example:
[P3-001] FIXED: Added 'Depends on: Task 2' to Task 5.
Task 5 imports src/types/notification.ts created by Task 2.
When: Two tasks touch same file without sequencing, or consumer before producer.
Fix log example:
[P4-001] FIXED: Added 'Depends on: Task 4' to Task 2.
Both modify src/routes/index.ts. Sequencing prevents conflicts.
When: Spec risk has no plan coverage and mitigation is straightforward.
Fix log example:
[P5-001] FIXED: Added Task 10 for convergence termination testing.
Mitigates: 'convergence loop may not terminate'.
Every auto-fix MUST be logged:
[{finding-id}] FIXED: {one-line description}
{new text added/modified}
{source/evidence}
The fix log lets users review silent changes and trace causes if fixes introduce new issues.
After Phase 2 auto-fixes, the convergence loop determines whether further progress is possible.
count_previous.count_current.count_current < count_previous: progress made. Go to Phase 2, apply new auto-fixes, return here.count_current == count_previous: no progress. Remaining issues need user input. Proceed to Phase 4.count_current > count_previous: fixes introduced new issues. Log warning, proceed to Phase 4.A fix in one pass can make a previously non-auto-fixable finding become auto-fixable. Examples:
Spec-mode cascades:
Plan-mode cascades:
Cascading fixes are why the loop re-runs ALL checks, not just those that produced auto-fixable findings.
Pass 1 (initial):
S1: 0 | S2: 1 (auto-fix) | S3: 2 (1 auto-fix, 1 user) | S4: 1 (auto-fix)
S5: 0 | S6: 0 | S7: 1 (auto-fix)
Total: 5 (4 auto-fixable, 1 user). count_previous = 5
Phase 2: Apply 4 fixes.
[S2-001] Added criterion #11 for 'offline mode'.
[S3-001] Added Node.js runtime assumption.
[S4-001] Added ENOENT error case.
[S7-001] Replaced 'fast' with 'under 30 seconds on CI'.
Pass 2:
S2: 0 | S3: 1 CASCADING (UTF-8 assumption now appendable) + 1 user unchanged
S4: 0 | S7: 0
Total: 2 (1 auto-fixable, 1 user). count_current=2 < 5. Continue.
Phase 2: Apply 1 fix. [S3-003] Added UTF-8 assumption.
Pass 3: Total: 1 (0 auto-fixable, 1 user). count_current=1 < 2. Continue.
Phase 2: 0 fixes.
Pass 4: Total: 1. count_current=1 = count_previous=1. Converged.
→ Phase 4 with 1 remaining issue.
Pass 1 (initial):
P1: 1 (auto-fix) | P2: 1 (auto-fix) | P3: 0 | P4: 0
P5: 1 (user) | P6: 0 | P7: 1 (user)
Total: 4 (2 auto-fixable, 2 user). count_previous = 4
Phase 2: Apply 2 fixes.
[P1-001] Added Task 9 for criterion #6 (error logging).
[P2-001] Added verification to Task 4.
Pass 2:
P1: 0 | P2: 0 | P3: 1 CASCADING (Task 6 needs 'Depends on: Task 9')
P5: 1 user | P7: 1 user
Total: 3 (1 auto-fixable, 2 user). count_current=3 < 4. Continue.
Phase 2: [P3-001] Added 'Depends on: Task 9' to Task 6.
Pass 3: Total: 2 (0 auto-fixable). count_current=2 < 3. Continue.
Phase 2: 0 fixes.
Pass 4: Total: 2. count_current=2 = count_previous=2. Converged.
→ Phase 4 with 2 remaining issues.
The loop terminates because:
When findings remain after convergence, present them. If no needs-user-input findings remain, skip to Clean Exit.
error findings before warning findings. Errors block sign-off.N remaining issues need your input (X errors, Y warnings).For each finding, present three sections:
What is wrong:
[{id}] {title} ({severity})
{detail}
Evidence: {evidence[0]}, {evidence[1]}, ...
Why it matters:
error: "Blocks sign-off. Must be resolved."warning: "Advisory. May dismiss with reason (logged)."Suggested resolution:
Accepted responses:
resolved.[{id}] DISMISSED: {reason}. Not re-surfaced.Error findings cannot be dismissed.
Surfaced findings: N total
Resolved: X | Dismissed: Y | Pending: Z
Update after each response. When all addressed, proceed to Step 5.
All of the following must be true:
error findings pending or dismissed.On clean exit:
CLEAN EXIT — all checks pass. Returning control to {parent skill} for sign-off.Note: {N} warnings dismissed. See log.| Check | Without graph | With graph |
|---|---|---|
| S5 | Grep/glob for referenced patterns | query_graph + get_relationships for dependency/architecture verification |
| S3 | Infer from codebase conventions | find_context_for for related design decisions |
| P1 | Text matching criteria to tasks | Graph traceability edges |
| P3 | Static analysis of task descriptions | get_impact for dependency completeness |
| P4 | Parse file paths, detect conflicts | Graph file ownership for accurate conflict detection |
All checks work from document analysis and codebase reads alone. Graph adds precision but is never required.
harness validate — Run by parent skill before/after soundness review. This skill does not invoke validate directly.--mode spec; harness-planning invokes --mode plan..harness/graph/ exists, use query_graph and get_impact for enhanced checks. Fall back to file-based reads otherwise.SoundnessFinding schema is defined in SKILL.mdharness validate passes after all files are written| Flag | Corrective Action |
|---|---|
| "The spec looks internally consistent at a high level" | STOP. S1 requires checking each decision against Technical Design line by line. "High level" consistency misses contradictions in the details. |
| "This assumption is obvious and doesn't need to be stated" | STOP. S3 exists because unstated assumptions cause the most damage when wrong. If it's obvious, writing it down costs nothing. Skipping it costs debugging time later. |
| "The finding is minor so I'll auto-fix it without surfacing to the user" | STOP. Only inferrable fixes are auto-fixed. If the fix involves a design choice — even one you think is obvious — surface it. You are not the designer. |
// TODO: add traceability or // spec gap — fill later in spec/plan files | STOP. TODOs in specs are unfinished review. The spec is not converged. Fix the gap or surface it as a finding — do not defer it. |
Review-never-fixes: Soundness review identifies structural issues in specs and plans. It applies inferrable fixes (formatting, missing links, obvious gaps) but NEVER makes design decisions. If a finding requires judgment, surface it to the user — even if the "right" answer seems obvious. A reviewer who makes design decisions has stopped reviewing and started designing without the authority to do so.
When a check produces ambiguous results, classify the ambiguity immediately:
autoFixable: false.Do not auto-fix ambiguous findings. Ambiguity means you lack context — applying a "fix" without context is guessing.
Soundness check rubrics used internally MUST use compressed single-line format. Each check is one line with pipe-delimited fields:
mode|check-id|severity|criterion
Example (Spec Mode rubric):
spec|S1|error|No contradictions between decisions, technical design, and success criteria
spec|S2|warning|Every goal has at least one success criterion; no orphan criteria
spec|S3|warning|All implicit assumptions documented in Assumptions section
spec|S4|warning|Error/edge cases covered; EARS unwanted-behavior gaps filled
spec|S5|error|No references to nonexistent codebase capabilities or incompatible patterns
spec|S6|error|No speculative features without requirement traceability
spec|S7|warning|All success criteria are observable and measurable with concrete thresholds
Why: Verbose check descriptions inflate review context without improving check accuracy. Dense single-line rubrics give the same signal in fewer tokens, leaving more budget for actual document analysis.
Rules:
spec or planerror or warning| Rationalization | Reality |
|---|---|
| "The spec looks coherent to me, so I can skip running the S1 internal coherence check" | Every check in the mode must run. S1 detects contradictions that human review frequently misses. |
| "This unstated assumption is obvious, so documenting it would be pedantic" | S3 exists because "obvious" assumptions cause the most damage when wrong. Cheapest to document, most expensive to miss. |
| "The success criterion is somewhat vague but the team will know what it means" | S7 flags vague criteria like "should be fast" because they are untestable. Vague criteria survive brainstorming only to fail at verification. |
| "This auto-fixable finding is minor, so I will just note it rather than applying the fix" | Auto-fixable findings should be applied silently — that is the design intent. Skipping them ships known inferrable gaps. |
| "The feasibility check found a signature mismatch but the code can probably be adapted during execution" | S5 red flags are always severity "error" and always surfaced. A spec referencing nonexistent modules produces a broken plan. |
| "The convergence loop is taking too long, so I will skip the re-check and declare converged" | Convergence requires the issue count to stop decreasing. Declaring convergence without a re-check is falsifying the exit condition. |
| "This spec is well-written enough that a soundness review would not find anything" | Every spec gets a soundness review. Well-written specs still have unstated assumptions (S3) and vague criteria (S7). The review is not optional. |
Context: harness-brainstorming has drafted a spec and is about to sign off.
Invoking harness-soundness-review --mode spec...
Phase 1: CHECK
S1 (internal coherence)... 0 findings
S2 (goal-criteria traceability)... 1 finding (auto-fixable)
S3 (unstated assumptions)... 2 findings (2 need user input)
S4 (requirement completeness)... 1 finding (auto-fixable)
S5 (feasibility red flags)... 0 findings
S6 (YAGNI re-scan)... 0 findings
S7 (testability)... 1 finding (auto-fixable)
5 findings total: 3 auto-fixable, 2 need user input.
Phase 2: FIX
[S2-001] FIXED: Added success criterion for 'Support offline mode' goal.
[S4-001] FIXED: Added ENOENT error case for config file read.
[S7-001] FIXED: Replaced 'build should be fast' with 'completes in under 30 seconds on CI'.
3 auto-fixes applied.
Phase 3: CONVERGE
Re-running checks...
S3-001 now auto-fixable (S4-001 created Assumptions section).
[S3-001] FIXED: Added Node.js runtime assumption.
1 additional fix. Re-checking...
Issue count: 1 (was 2). Decreased — continuing.
Re-checking... Issue count: 1 (unchanged). Converged.
Phase 4: SURFACE
1 remaining issue:
[S3-002] Ambiguous concurrency model (warning)
Technical Design describes background job processor without specifying
in-process, worker thread, or separate process.
→ Add decision to Decisions table.
User resolves → adds decision: "in-process event loop"
Re-running checks... 0 findings.
CLEAN EXIT — returning control to harness-brainstorming for sign-off.
Context: harness-planning has drafted a plan and is about to sign off.
Invoking harness-soundness-review --mode plan...
Phase 1: CHECK
P1 (spec-plan coverage)... 1 finding (auto-fixable)
P2 (task completeness)... 2 findings (auto-fixable)
P3 (dependency correctness)... 1 finding (auto-fixable)
P4 (ordering sanity)... 0 findings
P5 (risk coverage)... 1 finding (needs user input)
P6 (scope drift)... 0 findings
P7 (task-level feasibility)... 1 finding (needs user input)
6 findings total: 4 auto-fixable, 2 need user input.
Phase 2: FIX
[P1-001] FIXED: Added Task 9 covering criterion #5 (error logging).
[P2-001] FIXED: Added verification step to Task 3.
[P2-002] FIXED: Added outputs to Task 6.
[P3-001] FIXED: Added 'Depends on: Task 2' to Task 5.
4 auto-fixes applied.
Phase 3: CONVERGE
Re-checking... Issue count: 2 (was 6). Decreased — continuing.
Re-checking... Issue count: 2 (unchanged). Converged.
Phase 4: SURFACE
2 remaining issues:
[P5-001] Spec risk 'performance vs correctness' has no mitigation (warning)
→ Add performance benchmark task, relax validation, or accept risk.
[P7-001] Task 7 depends on undecided caching strategy (error)
→ Make caching decision in spec, then update Task 7.
User resolves P5-001 → adds Task 10 for performance benchmark.
User resolves P7-001 → adds LRU cache decision, updates Task 7.
Re-running checks... 0 findings.
CLEAN EXIT — returning control to harness-planning for sign-off.
These are hard stops. Violating any gate means the process has broken down.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeValidates plan and task quality post-/speckit-plan and /speckit-tasks: coverage matrix, red flag scanning, task standards enforcement, NFR validation, REVIEWERS.md generation.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.