From Sumo QA
Use when a review, mutation run, or graded scenario has already named a concrete uncovered behavior gap and the user wants it closed with evidence — e.g. 'close this gap', 'drive this uncovered risk to a regression test', 'fix this surviving mutant end to end'. Drives ONE closed loop per gap — failing test → red evidence → minimal fix → green evidence → risk/ledger update. Pauses when repo context or evidence is insufficient.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sumo-qa:sumo-qa-closing-qa-gapsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
QA evidence — a review's uncovered risk, an unmet acceptance criterion, a mutation survivor, a graded-scenario failure — has named a concrete behaviour with no covering test. This skill drives it to closure one loop at a time (red → minimum change → green → status update), orchestrating the existing disciplines — never replacing them.
QA evidence — a review's uncovered risk, an unmet acceptance criterion, a mutation survivor, a graded-scenario failure — has named a concrete behaviour with no covering test. This skill drives it to closure one loop at a time (red → minimum change → green → status update), orchestrating the existing disciplines — never replacing them.
Announce at the start of EVERY turn while this skill is active — entry, red, green, status-update, refusal, and decline turns all open with: "Closing one QA gap at a time." A turn that skips the announce has left the loop discipline.
Inherits the global discipline from using-sumo-qa: output discipline (no internal taxonomy labels), output economy (findings not framing; one question per turn; no closing pleasantries), knowledge authority hierarchy, specialty-tool fit.
ONE GAP PER LOOP, EVIDENCE AT BOTH ENDS. The red (failing) output is captured before any production change; the green output plus targeted regression is captured before the gap's status moves. A second gap starts only when the user explicitly asks. This skill NEVER declares a change safe to merge — that verdict belongs to sumo-qa-reviewing-before-merge, against fresh evidence.
sumo-qa-deciding-approach routes here for closed-loop-gap-fix: "close this gap", "drive this uncovered risk to a regression test", "take this survivor through the loop".
NOT for: feature work with no named gap (tdd-scaffold / regression-first directly); a batch or module-wide rewrite; the merge verdict itself (sumo-qa-reviewing-before-merge).
You MUST work through these in order, tracked as an ordered work list. Steps 4–7 are one loop; the loop body repeats only via step 8's explicit-ask gate.
Name the ONE gap and its source — restate the gap as a single concrete behaviour (file/function + the uncovered behaviour) and where it came from (review risk or AC row, mutation survivor, graded-scenario failure, user-described defect). If the evidence names several gaps, pick exactly one — the user's stated priority, else the first — and park the rest BY ID, out loud, in the exact parking sentence the Good example models. Never batch.
Inspect the loop's prerequisites — open the production file and the matching test file; confirm the command that runs the relevant tests. Any prerequisite not inspectable → the HARD-GATE fires: stop and ask, this turn.
Route the entry by its kind —
sumo-qa-strengthening-tests discipline FIRST: production code stays UNCHANGED while the test is strengthened to kill the survivor (that skill's Iron Law binds here, unweakened). Here red (step 4) is the strengthened test failing against the MUTANT (unchanged production passes it), and green (step 5) is that test passing with production still unchanged — no production change. The strengthening proposal carries the SAME obligations as step 4: name the survivor by its production location and mutation (e.g. pricing/discounts.py:42 — subtotal > 10000 mutated to >=), cite the technique by its verbatim loaded-catalogue heading (e.g. "boundary value analysis"), AND name the target test file and test. Only if the strengthened test then exposes a genuine production defect does this loop continue into a fix — surfaced as a separate, explicit decision, never silently.sumo-qa-implementing-with-tdd discipline (regression-first): its red-first Iron Law, stub allowance, and confirmation gates apply unchanged.Red — capture the failing evidence before anything changes — write (or request) the focused failing test for THIS gap only, citing the technique by its verbatim loaded-catalogue heading. Never name a test file or function the supplied context has not shown — a test path the context can't ground is fabrication; ask for the files instead. Run it; capture the real assertion failure verbatim. Import/syntax/fixture errors are not red. When the test cannot be run locally (missing runtime, remote-only suite), say so explicitly, record what evidence IS available, and get the user's go-ahead before any change — never claim a red you didn't see.
Green — minimum change, re-run, capture — the smallest production change that closes THIS gap (none on a survivor entry — step 3; or the user makes it, per the TDD skill's handoff). Re-run the test; capture the green output verbatim. A pass for the wrong reason (weakened assertion) is not green.
Targeted regression — run the changed module's tests plus closest siblings; capture pass/fail counts. Any green-to-red elsewhere reopens the loop: surface it and do NOT move the gap's status.
Update the gap's status — only now, only on evidence — the gap's recorded status moves because evidence changed (red→green captured this loop), never because prose changed. If a risk-to-test ledger is in play, update that row in the SAME ledger format — evidence_status from failing/planned to passing — and QUOTE both evidence lines verbatim from the run output the turn supplied, alongside the ledger: (1) the gap's own green line (e.g. tests/billing/test_refund.py::test_refund_over_balance_rejected PASSED) and (2) the targeted-regression count (e.g. 41 passed). A ledger row flipped to passing without BOTH quoted lines is wording-only progress, not coverage. When the row flips to passing, also resolve its residual — a closed blocker becomes mitigated, never left as blocker on a passing row. Touch only the row whose evidence changed. Do not invent a competing status format. If the same turn carries a merge question, step 8's pinned merge-decline applies IN THIS TURN — answer it here, don't defer it.
Close the loop, offer — don't start — the next — report the evidence pair (red, green) plus regression counts, then offer the next parked gap in CONDITIONAL form only: "R2 is next if you want it — say the word and I'll start that loop." Never "for the next loop, R2…" or any phrasing that reads as the loop already opening. Continue only when the user explicitly asks. If the user asks whether it's safe to merge, answer EXPLICITLY with this pinned line — "I can't call this safe to merge — that verdict belongs to sumo-qa-reviewing-before-merge, against fresh evidence; route there next." — omitting the merge question is not declining it. If the user asks for a module-wide tidy/rewrite instead of a named gap, decline it OUT LOUD in this exact shape (substituting the recorded gap ids, risks, and planned tests): "The loop that just closed covered R1 only. Declining the module-wide change — no single named gap, no red evidence to anchor it. Parked: R2 — multi-currency rounding unpinned (planned: rounding boundary test); R3 — close with pending refund (planned: pending-refund guard test). Pick one and I'll run it as its own loop." Then STOP — a broad ask is not an explicit ask for the next gap; do not open any loop on that turn, and do not invite a rewrite confirmation ("confirm and I'll tidy it next") — the decline is unconditional, redirecting to the parked gaps only.
The Checklist above is the flow.
| Thought | Reality |
|---|---|
| "I'll close all three gaps in one pass — it's faster" | One gap per loop — the user's correction on loop 1 changes loop 2. Park the rest, visibly. |
| "The fix is obvious — skip straight to it" | A test that has never failed has never tested anything. Red first, captured verbatim. |
| "This survivor is easiest to kill by tweaking the production code" | Production stays locked until the test is strengthened; a defect found after that is a NEW, surfaced decision. |
| "Green — so it's safe to merge" | Never from this skill. The merge verdict belongs to sumo-qa-reviewing-before-merge against fresh evidence. |
| "The strengthening test is written — mark the ledger row covered" | Written is not run. evidence_status moves on captured output only — quote the green line AND the regression count. |
| "I can't read the repo, but payment services all look alike" | HARD-GATE: pause and ask. Invented paths close loops on fiction. |
| "While I'm in here, I'll tidy the whole module" | Scope is the named gap. Decline unconditionally — don't invite a rewrite confirmation. |
| "User said 'keep going', so I'll fix R2 and R3 together" | Explicit ask buys the NEXT loop, not a batch. |
Review names three uncovered risks; user: "close them."
AI: "Closing one QA gap at a time. Starting with R1 —
apply_refundaccepts an over-balance amount. R2 and R3 are parked as later loops — I'll start the next one only when you ask. Test idea:tests/billing/test_refund.py::test_refund_over_balance_rejected, refund of 150.00 against a 100.00 remaining balance, expectingRefundError— boundary value analysis on the remaining-balance edge. I'll capture the red (failing) run before any production change. IsRefundErrorthe agreed contract for over-balance, or should it clamp to the remaining balance?"On confirm: writes the test, runs it, surfaces the
FAILED … RefundError not raisedoutput, hands off for the green change.
mutmut survivor:
subtotal > 10000→subtotal >= 10000, no test fails. User: "take it through the loop."AI: "Closing one QA gap at a time. Production stays unchanged — strengthening the test first. The survivor lives at
pricing/discounts.py:42(subtotal > 10000mutated to>=). Boundary value analysis atsubtotal = 10000: the original must NOT qualify (>is strict), so the strengthening testtests/pricing/test_discounts.py::test_no_discount_at_exact_thresholdassertsqualifies(10000) is False— the mutant (>=) fails it, the original passes. If the strengthened test instead exposes a real defect inqualifies, I'll surface that as a separate fix decision."
AI: "I've added tests for all three risks, patched
refund.pyandinvoice.py, everything's green — safe to merge."Three violations: batched gaps, production changed before any captured red, and a merge verdict this skill must never issue.
Loop closed → offer the next parked gap, or route to sumo-qa-reviewing-before-merge for the safe-to-merge verdict against fresh evidence. Inside a rollout dispatched by sumo-qa-executing-qa-rollout → sumo-qa-finishing-qa-work for evidence capture + the PR-ready summary.
npx claudepluginhub sumithr/sumo-qa --plugin sumo-qaProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.