From ywc-agent-toolkit
Validates specification completeness, consistency, feasibility, and readiness for implementation before task decomposition. Triggers on spec review requests like "사양 검토" or "check the spec".
How this skill is triggered — by the user, by Claude, or both
Slash command
/ywc-agent-toolkit:ywc-spec-validateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Announce at start:** "I'm using the ywc-spec-validate skill to validate specification completeness, consistency, and feasibility."
Announce at start: "I'm using the ywc-spec-validate skill to validate specification completeness, consistency, and feasibility."
Spec Reviewer Agent Skill for specification quality validation.
When tempted to skip a step, check this table first:
| Excuse | Reality |
|---|---|
| "Spec is mostly clear, no need to assess Feasibility" | Feasibility and Code Compatibility require project context. Always read CLAUDE.md, package.json, and existing code first. |
| "Spec gaps are obvious, just list them and finish" | Each gap must specify which dimension (completeness / consistency / feasibility / compatibility) and propose a concrete clarifying question. |
| "User wrote the spec, do not push back too hard" | Honest review prevents implementation rework. Soften tone, never soften findings. |
| "Spec uses internal jargon, infer the intent" | If a term is undefined, that is a completeness gap. Flag it. |
| "Architecture decision is implicit, do not flag it" | Implicit ≠ specified. Surface implicit assumptions for explicit confirmation. |
| "Spec follows best practices, so complexity is fine" | The 4 base dimensions catch what is missing or conflicting, never what is excessive. Abstraction, configurability, or generality the stated scope does not yet require is over-engineering — surface it as a Simplicity Warning. Would a senior engineer call this spec overbuilt? |
| "The spec contradicts CLAUDE.md, follow the spec" | Surface the conflict. Do not silently let the spec override project rules. |
| "4-dimension review is fast enough sequentially" | Phase 1 fan-out cuts latency on large specs; each Sonnet subagent handles one dimension, reducing per-call context and preventing cross-dimension finding contamination. |
Violating the letter of these rules is violating the spirit. A spec review that finds nothing is not a review.
| Parameter | Format | Example | Description |
|---|---|---|---|
--spec | --spec <path> | --spec docs/outline/02-api.md | Specification file path (required) |
--mode | --mode standard|socratic | --mode socratic | Output style. standard (default) returns the finding report and gate verdict; socratic returns learning questions instead. See references/socratic-mode.md. |
--focus | --focus <area> | --focus architecture | Optional focus hint. When Council Escalation triggers, the generic 4 voices are replaced with domain expert profiles for the chosen area. Valid: requirements, architecture, testing, compliance. See references/expert-profiles.md. |
--format | --format markdown|html | --format html | Output format. Default markdown. With html, writes a self-contained HTML report to claudedocs/. See html-output.md. Orthogonal to --mode |
--advisor-budget | --advisor-budget <n> | --advisor-budget 0 | Opus advisor escalation budget for this invocation. Default: current behavior (≤2). 0 disables Phase 2 escalation entirely — ambiguous advisor candidates are reported as normal Suggestions (with a "would have escalated but budget=0" note) instead of being escalated. Used by orchestrators (e.g. ywc-spec-ready) to cap cumulative advisor cost across iterations. |
Collect Project Context — Read the following before touching the spec file. The Feasibility and Code Compatibility dimensions cannot be assessed without this context:
CLAUDE.md (or the nearest parent CLAUDE.md) — language/framework constraints, commit conventions, any explicit out-of-scope rulespackage.json / pyproject.toml / go.mod — runtime, major dependencies, and their versionssrc/, app/, prisma/, migrations/)docs/ubiquitous-language.md (if it exists) — canonical domain terms and their "Synonyms to Avoid"; any spec term matching a Synonyms entry is a Consistency findingIf CLAUDE.md is absent and package.json is absent, note "project context unavailable" and flag all Feasibility findings as lower-confidence.
Read Specification File — Confirm the --spec path exists before reading. If the file does not exist, stop immediately with BLOCKED status and report the missing path.
Check Code Compatibility — Use Grep/Glob to search for related DB schemas, API routes, and type definitions to detect conflicts with the spec. Scope the search to the directories identified in Step 1.
3.5. Precedent-Completeness Re-grounding (MANDATORY omission check — distinct from the conflict check in Step 3). Step 3 / Code Compatibility detect where a spec claim conflicts with code. They structurally cannot detect an omission: an existing integration site the spec never names makes no claim, so nothing conflicts — yet a dropped integration site silently breaks the feature on the un-modeled runtime path while every written claim still validates cleanly. (This is the failure class that survives repeated spec-validation; re-reading spec text can never reveal a site the text never names.)
For every existing pattern the spec mirrors / follows / 踏襲 / extends (a named precedent X), you MUST build a Precedent Site Coverage table before writing the report — do not reason about it in prose, build the table:
Grep X's own identifier(s)/marker(s) across the actual code: grep -rn "<precedentFn>\|<precedentMarker>" <src> (exclude test files). Integration patterns (snippet/script injection, hooks, interceptors, lifecycle handlers) routinely span a generation step, a preview/transform step, and a publish/render-time step — enumerate all of them; the publish/render-time site is the one a memory-driven spec most often drops.
Emit one row per site X integrates at:
| Precedent site (file:line + what runs there) | Spec coverage: Replicated / Deferred-with-reason / OMITTED |
Every grep hit MUST appear as a row. Decide each row's coverage by searching the spec for that specific site — never by trusting the spec's own prose summary that it "follows X".
Every row marked OMITTED is its own Completeness Critical. Do not fold multiple omitted sites into one finding, and do not let a louder conflict elsewhere (e.g. a stale-anchor or deprecated-path finding) substitute for explicitly naming the omitted site — an omission and a conflict are different defects and both must be reported. Each OMITTED Critical names the omitted site's file:line and the runtime path on which the feature fails.
Include the coverage table in the report, and feed each OMITTED row into the Completeness dimension of Phase 1 as a separate Critical.
Phase 1 — Parallel Dimension Review — Use the Task tool to spawn 4 Sonnet subagents in parallel, one per review dimension. Pass each subagent the project context from Step 1 and the spec text from Step 2:
| Subagent | Model | Dimension | Focus |
|---|---|---|---|
| Completeness | sonnet | Completeness | Missing required items, edge cases, pagination |
| Consistency | sonnet | Consistency | Terminology mismatches; Synonyms-to-Avoid violations |
| Feasibility | sonnet | Feasibility + Simplicity | Implementable with current stack and dependencies; also flag over-engineering — abstraction, configurability, or generality the stated scope does not yet require (surface as Warning) |
| Code Compatibility | sonnet | Code Compatibility | Conflicts with existing schema, API routes, type definitions |
Each subagent returns:
Step 4b — Aggregate: Combine and deduplicate findings by {file}:{line}. Cap advisor candidates at advisor_budget (default: 2), prioritizing Critical over Warning. Log any dropped candidates in the report.
The 4-dimension analysis runs the same way regardless of --mode; the mode only changes how findings are presented in step 5.
Output Report — Branch on --mode
standard (default): emit the severity-classified finding report in the format below.socratic: emit a learning-question list per references/socratic-mode.md. Findings still drive question selection — they are just reshaped from "fix X" into "what would happen if X?". Completion Status is SOCRATIC and the gate score is informational only.| Dimension | Review Focus |
|---|---|
| Completeness | Missing required items (error handling, edge cases, pagination, etc.) |
| Consistency | Terminology, format, or data structure mismatches across documents; spec terms appearing in the "Synonyms to Avoid" column of docs/ubiquitous-language.md |
| Feasibility | Whether the spec can be implemented with the current stack |
| Code Compatibility | Conflicts with existing DB schema, API route patterns |
| Simplicity | Whether the spec specifies abstraction, configurability, or generality the stated scope does not yet require (over-engineering). Assessed within the Feasibility pass — surface speculative scope as a Warning, not a separate subagent. |
## Spec Review Result: {filename}
### Summary
- Critical: N, Warning: M, Suggestion: K
- Phase 2 advisor calls used: X of N ({single-advisor|council|none}) (N = applied budget; default 2, or the `--advisor-budget` value)
### Precedent Site Coverage
| Precedent site (`file:line` + what runs there) | Spec coverage: Replicated / Deferred-with-reason / OMITTED |
|---|---|
| ... | ... |
### Critical Issues
1. [{file}:{line}] [P1|P2] Description — Suggested improvement
(if P2) Advisor rationale: {one-line rationale}
### Warning Issues
1. [{file}:{line}] [P1|P2] Description — Suggested improvement
### Suggestions
1. [{file}:{line}] Description — Suggested improvement
### Completion Status
(One of: DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT)
Completion Status rules:
| Status | When to use |
|---|---|
DONE | Review complete, no Critical findings |
DONE_WITH_CONCERNS | Review complete but Critical issues were found — the spec needs revision before task decomposition |
BLOCKED | Review cannot proceed — spec file missing, or a Phase 2 advisor returned an inconclusive verdict |
NEEDS_CONTEXT | --spec argument is missing or the file is empty/unreadable |
SOCRATIC | --mode socratic was used; output is a learning-question list, not a gate verdict. Downstream skills (especially ywc-task-generator) must not consume this status as a handoff signal. |
Re-plan handoff on DONE_WITH_CONCERNS — after printing the report body, append the following one-liner so the user (or an interactive orchestrator) can resolve findings via ywc-plan Re-plan Mode instead of rewriting the spec from scratch:
To address the Critical findings above without losing validated sections, run:
/ywc-plan --update-spec <spec-path> --failure-context "<one-paragraph summary of the Critical findings>"
This routes findings into the ## Iteration N Amendments append flow described in ywc-plan Step 4c, preserving validated portions of the spec. Print this line only on DONE_WITH_CONCERNS — DONE, BLOCKED, NEEDS_CONTEXT, and SOCRATIC do not benefit from Re-plan Mode. For programmatic consumers (e.g., ywc-agentic), the Programmatic Consumer Policy below already invokes the equivalent flow; this surface is for interactive users who would otherwise re-write from scratch.
HTML mode (
--format html) — emits the same findings as a self-contained HTML report: severity color coding, tab navigation, and aCopy as Markdownbutton. Structure and conventions follow html-output.md. The Markdown surface is preserved inside the file, so downstream integration is unaffected.
This skill runs Phase 1 as 4 parallel Sonnet subagents (one per dimension) and aggregates their findings. For a small number of genuinely ambiguous findings from Phase 1, the orchestrator escalates to an Opus advisor using the Task tool with model: opus. This follows Pattern B from advisor-pattern.md — parallel executors in Phase 1, frontier judgment applied only where it actually matters in Phase 2.
Budget: up to 2 Opus advisor calls per invocation by default. Unused budget is good. Spec review is smaller in scope than impl-review, so the cap is tighter.
--advisor-budget <n> override: when set, <n> is the hard ceiling on Phase 2 Opus calls for this invocation (default 2). The report header's Phase 2 advisor calls used: X of N line reflects the applied budget as N so an orchestrator can track cumulative spend across iterations. When --advisor-budget 0, Phase 2 escalation is disabled entirely: each finding that would have escalated is instead reported as a normal Suggestion carrying a one-line would have escalated but budget=0 rationale — it is never silently dropped. This lets a loop driver (e.g. ywc-spec-ready) run cheap deterministic passes and spend advisor budget only when it chooses to.
Escalation conditions — each must satisfy all three properties from advisor-pattern.md §5 (objective trigger, irreversibility, ambiguity):
Context payload rules (critical for cost discipline):
Non-goals — do not escalate for these:
Report escalations in the output: mark Phase 2 findings with [P2] prefix and include the advisor's one-line rationale. This preserves auditability of which findings involved frontier judgment.
When a finding involves a genuine architectural trade-off — where two reasonable positions exist and the wrong choice has significant implementation cost — the single Opus advisor may be replaced by a 4-voice Council review. Use council escalation when all three conditions hold:
Council voices (spawn one Opus subagent per voice, passing only the specific trade-off excerpt — not the full spec):
| Voice | Role | Anti-bias Mandate |
|---|---|---|
| Architect | Correctness and long-term maintainability | Challenge convenience-driven choices |
| Skeptic | Break the dominant assumption; find what everyone is taking for granted | Ignore sunk cost arguments |
| Pragmatist | Shipping speed and team execution risk | Ignore theoretical purity |
| Critic | Failure modes; what breaks at scale or under adversarial conditions | Assume the happy path is wrong |
Anti-anchoring rule: Each voice receives only the trade-off description — not the other voices' verdicts. Collect all four independently before synthesizing.
Council output format (append to the relevant finding in the report):
[COUNCIL] {trade-off description}
- Architect: {verdict, ≤2 sentences}
- Skeptic: {verdict, ≤2 sentences}
- Pragmatist: {verdict, ≤2 sentences}
- Critic: {verdict, ≤2 sentences}
Consensus: {agreed recommendation, or "No consensus — strongest dissent: <voice>: <reason>"}
When --focus <area> is set on the invocation and Council Escalation triggers, replace the 4 generic voices above with 3–4 domain expert profiles from references/expert-profiles.md. The Anti-anchoring rule, output format, and budget rule all apply unchanged — only the voice labels and heuristics change.
--focus value | Profiles to use (in order) |
|---|---|
requirements | Requirements Quality + Specification by Example + Use Case Modeling |
architecture | Interface Design + Service Boundaries + Integration Patterns + Production Resilience |
testing | Risk-Based Testing + Three Amigos + Quality Attribute |
compliance | Production Resilience + Cloud-Native Operations + Audit Trail |
Use generic Council (no --focus) when the trade-off spans two or more areas, or when the spec covers multiple domains and no single focus dominates. Mixing generic voices and focus profiles within a single Council session breaks anti-anchoring — pick one mode per session.
Council escalation consumes the full 2-advisor budget. Do not use council and single-advisor escalation in the same invocation.
This skill applies the Confidence Gate before declaring the spec ready for task-generator.
Required dimensions (must each score ≥ 70):
Band-to-status mapping for this skill:
| Gate band | Completion status | Action |
|---|---|---|
| PROCEED (≥ 90) | DONE | Spec is ready for task-generator. |
| REVIEW (70 – 89) | NEEDS_CONTEXT | Present 1-3 open questions; consider council escalation if multiple dimensions are < 80. |
| STOP (< 70) | BLOCKED | Report which dimensions are weak; do not hand off to task-generator. |
The gate score must appear in the report header alongside the Critical/High/Medium/Low finding counts. A required dimension scoring below 50 forces STOP regardless of aggregate.
ywc-spec-writer: validates docs/specification/ section files (01-overview.md … 07-glossary.md). All 4 review dimensions (completeness, consistency, feasibility, code compatibility) apply fully.ywc-plan: can validate feature plan documents in docs/ywc-plans/. Completeness and consistency dimensions apply; feasibility and code compatibility apply partially because plan documents are less implementation-specific.ywc-task-generator output (tasks/): task-generator consumes a validated spec; it does not produce one. Passing a task directory as --spec is a misuse.ywc-task-generator — hand off only when completion status is DONE. DONE_WITH_CONCERNS requires spec revision first.When consumed by an orchestrator that cannot prompt for human input (e.g., ywc-agentic):
| Completion Status | Action |
|---|---|
| DONE | Proceed to ywc-task-generator |
| DONE_WITH_CONCERNS | Re-invoke ywc-plan with --update-spec + --failure-context + --non-interactive (max 1 retry), then re-validate |
| BLOCKED | Stop execution; surface a structured triage report containing: (1) attempted triage steps, (2) verbatim blocker text, (3) proposed default action |
| NEEDS_CONTEXT | Stop execution; surface a structured triage report containing: (1) attempted triage steps, (2) verbatim blocker text, (3) proposed default action |
| SOCRATIC | Stop execution and report to user — SOCRATIC output is not a handoff signal; re-run without --mode socratic to obtain a gate verdict |
Looped consumer: the table's
max 1 retryis the default for non-loop consumers (e.g.ywc-agenticStep 3 historically). When multi-iteration convergence is required, the dedicatedywc-spec-readyskill drives theDONE_WITH_CONCERNS → ywc-plan --update-spec → re-validatecycle repeatedly under its own--max-iterationscap and stall guards, and passes the remaining advisor budget via--advisor-budget. Useywc-spec-readyinstead of hand-rolling more than one retry.
npx claudepluginhub yongwoon/ywc-agent-toolkitGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.