From ywc-agent-toolkit
Gates implementation start by scoring readiness across five dimensions (scope clarity, architecture compliance, evidence quality, reuse verified, root cause identified) with PROCEED/REVIEW/STOP bands. Use before any non-trivial implementation or invoking code-gen skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ywc-agent-toolkit:ywc-confidence-gateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Announce at start:** "I'm using the ywc-confidence-gate skill to score readiness across five dimensions before implementation begins."
Announce at start: "I'm using the ywc-confidence-gate skill to score readiness across five dimensions before implementation begins."
This skill is the canonical pre-implementation gate. It exists because every implementation skill (ywc-code-gen, ywc-sequential-executor, ywc-parallel-executor) starts producing code immediately when invoked — and the cost of code produced from a half-understood spec is borne later by ywc-impl-review, CI, or production. A 5-minute confidence score catches the same defect classes that a 30-minute re-plan and a 2-hour debug session would catch — at a fraction of the cost.
The scoring rubric and band definitions are shared with ywc-impl-review and other downstream skills via ../references/confidence-gate.md. This skill does not redefine the rubric; it applies it at the pre-implementation entry point.
NO IMPLEMENTATION WITHOUT AN EXPLICIT CONFIDENCE SCORE AND BAND DECISION
If the aggregate score is below 90 and the band is REVIEW or STOP, implementation does not begin until the weakest dimension has been raised (or the user has explicitly accepted the residual risk and the agent has surfaced what cannot be raised). "I'll figure it out as I go" is not a band decision — it is the failure mode this skill exists to prevent.
When tempted to skip the gate, check this table first:
| Excuse | Reality |
|---|---|
| "The user already approved the spec, that means PROCEED" | Spec approval is one signal among five (Scope clarity). Architecture compliance, evidence quality, reuse, and root cause are independent — a spec can be perfectly clear and still rest on an unverified architectural assumption. Score all five before deciding. |
| "I have high confidence overall — skipping the per-dimension scoring is fine" | "Overall" confidence is the failure mode. The point of the rubric is to surface the weakest dimension; aggregate-only scoring lets a strong dimension (e.g., scope clarity 95) mask a fatal weakness elsewhere (e.g., reuse verified 40 because no one searched the existing utils). The single-dim-below-50 rule is what catches these. |
| "Scoring is bureaucracy when the change is small" | A 5-minute gate on a 5-line change is fast. A 2-hour debug session on the same 5-line change because no one checked if a utility already exists is slow. The gate scales down for small changes (most dimensions score 95+ trivially), but it must still be executed and surfaced. |
| "I'll score after to validate that proceeding was right" | Scoring after the fact is rationalization, not evidence. The point is to score before the decision so the score informs it. Post-hoc scores always conveniently clear the threshold. |
| "Evidence quality dimension is hard to score precisely — I'll round up" | "Round up to clear the threshold" is the most common gate-defeating move. A dimension at 65 is REVIEW; rounding to 70 to PROCEED is a documented anti-pattern in the shared rubric (§7). Score honestly; if the band is REVIEW, present alternatives — that is the correct outcome. |
| "REVIEW band wastes time on questions the user already has answers to" | REVIEW band exists precisely to surface those answers explicitly so they bind the implementation. If the user has them in their head but they are not in the spec, the implementation will diverge. Present the 1–3 alternatives or open questions; the user clears them in seconds. |
| "STOP band is for catastrophic situations only" | STOP fires whenever aggregate < 70 or any required dimension < 70. It is not catastrophic — it is "the current evidence does not support starting; raise it first". Treat STOP as "go back to spec / research / context", not as "abandon project". |
| "If I report a sub-90 band, the user will think I'm being slow" | The opposite. Reporting a REVIEW band with the weakest dimension named saves the user the rework cycle they would otherwise pay for. The gate is a transparency mechanism — the alternative is opaque overconfidence that surfaces as broken CI hours later. |
| "I can skip the gate when running from inside another skill (e.g., ywc-agentic loop)" | Skill-to-skill delegation is the most dangerous entry point — the upstream context narrows what is in scope, and the downstream skill may not see the spec the user actually wrote. Always re-score at the boundary; do not inherit confidence from an upstream caller. |
| "The five dimensions don't apply to my type of work" | They are deliberately abstract so they apply across feature work, bug fixes, refactors, and infra. If a dimension feels not-applicable, score it generously (90+) — but score it. Removing a dimension defeats the comparability of scores across skills, which is the rubric's primary value. |
Violating the letter of this discipline is violating the spirit. The rubric is shared with downstream skills precisely so that scores remain comparable — a score from this skill must mean the same thing as a score from ywc-impl-review.
Read
../references/confidence-gate.mdfor the canonical rubric definition. The summary below is for quick reference; the reference file is the authoritative source.
| # | Dimension | Weight | One-sentence test |
|---|---|---|---|
| 1 | Scope clarity | 25% | "Can I state in one sentence what is in scope and one sentence what is out — without using vague terms like 'related cleanup' or 'and other improvements'?" |
| 2 | Architecture compliance | 25% | "Does the planned change follow existing structure / naming / abstractions, or am I introducing a new pattern? If new, was it discussed?" |
| 3 | Evidence quality | 20% | "Are the claims I am about to act on backed by primary sources (current file content, official docs, test output) or by inference / memory?" |
| 4 | Reuse verified | 15% | "Have I searched the codebase / package registry for existing utilities that solve this? Listed them and ruled each out with a reason?" |
| 5 | Root cause identified | 15% | "For a bug fix, do I name the underlying cause (not the symptom)? For greenfield work, do I name the underlying user need (not the surface request)?" |
Score each dimension 0–100. Aggregate is the weighted sum, rounded to the nearest integer.
The aggregate sets a tentative band; the single-dim override (below) may then drop it one level.
| Band | Aggregate score | Action |
|---|---|---|
| PROCEED | ≥ 90 | Begin implementation. Report the score in the completion summary or the executor's per-step report. |
| REVIEW | 70–89 | Present 1–3 alternatives or open questions before proceeding. Trigger the Advisor Pattern for any dimension < 70. Do not begin implementation until at least the weakest dimension is raised or explicitly accepted. |
| STOP | < 70 | Do not begin implementation. Report which dimensions are weak and what evidence would raise them. Hand back to ywc-plan (architecture / scope), ywc-spec-validate (evidence), ywc-tech-research (reuse), or ywc-brainstorm (root cause / user need). When the architecture dimension specifically scores below 70 and the decision is irreversible, an advisor dispatch to ywc-architect (Claude Code agent at claude-code/agents/ywc-architect.md) can render a verdict before re-running the gate (see Step 7: STOP-band Advisor Dispatch). |
Single-dim < 50 override: after the aggregate sets the tentative band, if any single dimension scored below 50, drop the band by one level — PROCEED → REVIEW, REVIEW → STOP (STOP is already the floor). It is always a one-level drop, never a jump straight to STOP: an aggregate ≥ 90 with one dimension at 45 lands on REVIEW, not STOP. A dimension at exactly 50 does not trigger the drop. This prevents one strong dimension from masking a fatal weakness, and matches the canonical rule in ../references/confidence-gate.md §3.
A separate, stricter rule applies only when a skill designates a dimension as required via the confidence-gate.md §4 profile: a required dimension scoring < 70 forces STOP regardless of aggregate. This gate does not designate required dimensions for its own runs, so only the
< 50override above applies here.
State in one sentence what is about to be implemented. If the sentence requires three or more clauses, the work item is too large — split it via ywc-task-generator before continuing.
For each of the five dimensions, do all of the following:
Use references/pre-implementation-checklist.md for the per-dimension probe questions.
aggregate = (scope × 0.25)
+ (architecture × 0.25)
+ (evidence × 0.20)
+ (reuse × 0.15)
+ (root_cause × 0.15)
Round to the nearest integer.
If any dimension scored below 50 (< 50), drop the band by one level: PROCEED becomes REVIEW; REVIEW becomes STOP (already the floor). A dimension at exactly 50 does not trigger the drop, and the drop is never more than one level — an aggregate-PROCEED with a sub-50 dimension lands on REVIEW, not STOP.
Print the report in the canonical format below (this is the same shape ywc-impl-review uses for post-implementation confidence — by design, so scores stay comparable).
ywc-code-gen, ywc-sequential-executor, etc.) immediately. Carry the score into the executor's per-step report.Trigger condition: band == STOP AND at least one dimension among (architecture / scope / evidence / reuse / root cause) scored < 50. STOP without a sub-50 dimension routes directly to the upstream skill listed in §Decision Bands without an advisor pass.
This step adopts the advisor dispatch pattern established by ywc-plan Step 3.5 (Architectural Advisor Gate) and ywc-incident-postmortem Step 4.5 — bounded payload, single dispatch, verdict-as-evidence, re-gate or handoff. See ../references/advisor-pattern.md for the canonical pattern definition.
5-step procedure:
proceed | reconsider with refinements | needs more context.Routing target table (per failing dimension):
| Failing dimension | Routing target | Why |
|---|---|---|
Architecture (< 50) | ywc-architect agent (claude-code/agents/ywc-architect.md) | Opus-tier design / trade-off analysis for irreversible architectural decisions |
Evidence (< 50) | /ywc-spec-validate skill | Spec quality review surfaces missing acceptance criteria / contradictions before implementation |
Reuse (< 50) | /ywc-tech-research skill | Targeted research into existing utilities, packages, or implementations that may eliminate the need to build from scratch |
Root cause (< 50) | /ywc-debug-rootcause skill + ywc-root-cause-analyst agent | Root-cause specialist agent (Opus, read-only) renders evidence-for / evidence-against verdict on the cause hypothesis |
Scope dimension < 50 does NOT have an advisor target — STOP on scope routes back to ywc-plan (Scale assessment) or ywc-brainstorm (intent surfacing) via §Decision Bands directly; an advisor cannot disambiguate scope without first knowing what the user actually wants.
Budget: 1 advisor dispatch per gate run. Even if multiple dimensions score < 50, dispatch the one with the lowest absolute score (or the architecturally most foundational — architecture > root cause > evidence > reuse) and let its verdict propagate through re-scoring. Multiple dispatches in one gate run defeat the bounded-payload discipline by accumulating context across iterations.
When NOT to dispatch:
band == STOP with no dimension < 50 (the STOP came purely from aggregate < 70) — hand off to the upstream skill listed in §Decision Bands directly.Confidence Gate Report
──────────────────────
Aggregate: <NN>/100 — <PROCEED | REVIEW | STOP>
Scope clarity: <NN> <one-line evidence>
Architecture compliance: <NN> <one-line evidence>
Evidence quality: <NN> <one-line evidence>
Reuse verified: <NN> <one-line evidence>
Root cause identified: <NN> <one-line evidence>
<If REVIEW or STOP:>
Weakest dimension: <name> (<NN>)
Why: <one or two sentences explaining what is missing>
What would raise it: <concrete evidence or action — file to read, command to run, question to ask>
<If REVIEW:>
Alternatives presented for user decision:
A. <option> — trade-off
B. <option> — trade-off
(C. <option> — trade-off)
<If STOP:>
Routing: <ywc-plan | ywc-spec-validate | ywc-tech-research | ywc-brainstorm> — <why>
The "Weakest dimension" and "What would raise it" lines are mandatory whenever the band is REVIEW or STOP. They are the mechanism that turns the gate from a roadblock into a step-by-step path forward.
ywc-plan (after Scale assessment, before downstream handoff); ywc-code-gen (Step 0, before Reuse Gate); ywc-sequential-executor / ywc-parallel-executor (before per-task implementation begins, especially for tasks with no upstream ywc-plan evidence); ywc-agentic (per-iteration entry).ywc-plan / ywc-spec-validate / ywc-tech-research / ywc-brainstorm on STOP.ywc-verify-done (the symmetric post-implementation gate — that gates the claim, this gates the start); ywc-impl-review (uses the same rubric for post-review confidence — scores remain comparable across the two gates).Before reporting a PROCEED band and beginning implementation, verify:
< 50)(Procedural failure modes specific to this skill. Behavioral rationalizations are in the table above — do not duplicate here.)
ywc-plan does not automatically transfer to ywc-code-gen, because the code-gen step has its own dimensions to score (especially Reuse verified and Architecture compliance, which ywc-plan does not score as deeply).references/confidence-gate.md §4), not in extra dimensions.| Reference | Use when |
|---|---|
| ../references/confidence-gate.md | Authoritative rubric definition; per-skill required-dimension profiles; status mapping; anti-patterns |
| references/pre-implementation-checklist.md | Per-dimension probe questions specific to the pre-implementation moment (this skill's distinct usage from the post-review usage in ywc-impl-review) |
| ../references/advisor-pattern.md | Escalating a weak dimension to Opus for a second opinion before changing the band |
| ../ywc-verify-done/SKILL.md | The symmetric post-implementation gate; both gates use the same rubric so the start-of-work and end-of-work scores are directly comparable |
npx claudepluginhub yongwoon/ywc-agent-toolkitGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.