From Sumo QA
Use after sumo-qa-planning-qa-rollout to dispatch a written QA plan task-by-task. Each task runs in a fresh subagent (parallel where independent); each subagent's output goes through a two-stage review (test-correctness → test-quality) before the task is marked done. Continuous execution — no per-task check-ins. Finishes by routing to sumo-qa-finishing-qa-work.
How this skill is triggered — by the user, by Claude, or both
Slash command
/sumo-qa:sumo-qa-executing-qa-rolloutThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Take a written plan from `sumo-qa-planning-qa-rollout` (or a hand-written equivalent at `docs/qa/plans/...`) and execute it by dispatching one fresh subagent per task, then walking each subagent's output through a two-stage review.
Take a written plan from sumo-qa-planning-qa-rollout (or a hand-written equivalent at docs/qa/plans/...) and execute it by dispatching one fresh subagent per task, then walking each subagent's output through a two-stage review.
Announce at start: "Dispatching the plan with subagents."
Inherits the global discipline from using-sumo-qa: output discipline (never surface internal taxonomy labels — say "behaviour change in pricing", not "Classification: business_logic_change"), output economy (spend output on findings not framing; no preamble or self-narration; one question per turn; no closing pleasantries), knowledge authority hierarchy, internal scaffolding stays internal, and specialty-tool fit.
ONE FRESH SUBAGENT PER TASK. TWO-STAGE REVIEW. CONTINUOUS EXECUTION.
Fresh subagent prevents context pollution; two-stage review separates "catches the right risk" from "well-shaped test"; continuous because mid-plan check-ins waste the user's attention.
Routes here from:
sumo-qa-planning-qa-rollout when a plan is signed offdocs/qa/plans/...", "run through the test rollout", "dispatch the QA work"For a single-task piece of work, skip this skill — go straight to sumo-qa-implementing-with-tdd or the matching individual skill.
You MUST work through these in order. Steps 1–2 are AI-only homework. The dispatch loop in step 3 is continuous: do NOT pause for user check-ins between tasks. Step 4 only fires when all tasks are done or one is genuinely blocked.
Read the plan (no user question) — load docs/qa/plans/<plan>.md. Extract every task verbatim, its approach tag, files, [parallel]/[sequential] marker, and "done when" criteria. Add an entry to the ordered work tracker per task.
Group by parallelism (no user question) — bucket tasks into parallel waves. All [parallel] tasks with no upstream dependency form wave 1. Sequential or dependency-blocked tasks form wave 2, 3, etc. Most QA plans collapse to 1–2 waves.
Dispatch loop (per wave, continuous):
prompts/implementer-prompt.md, filling in the task spec. Wave dispatches go in parallel (one delegation call per worker, all issued together so the host can run them concurrently).prompts/spec-reviewer-prompt.md. Checks: does the test cover the named risk? Does it run? Did the red phase happen (if TDD)? Did production code stay unchanged (if strengthen / verify-existing)?prompts/quality-reviewer-prompt.md. Checks: observable assertion (not implementation-coupled)? Deterministic? Tautology check?Final cross-task review — when all tasks are done, dispatch a final reviewer with the entire plan + all task outputs. Do the tests collectively cover all named risks? Are there seams between tasks neither covers? Run the full suite; surface counts.
Hand off to sumo-qa-finishing-qa-work — pass the plan, the task outputs, and the cross-task review. That skill captures evidence, produces the PR-ready summary, and closes the loop.
See the Checklist above — that's the flow.
Match the subagent model to the task shape via the host's worker-delegation primitive (where it exposes a model override):
| Thought | Reality |
|---|---|
| "I'll just do task 1 inline — subagents are overkill for 3 tasks" | Iron Law violated. Even small plans get fresh subagents per task. |
| "I'll combine spec + quality into one review subagent to save time" | Two reviews because they ask different questions. One agent doing both skimps on one. |
| "Task 3 mentioned task 2's fixture; I'll inherit context to skip re-explanation" | No. Fresh subagent. Re-explain via the prompt template. |
| "Let me pause after task 2 and ask the user if the direction's right" | Continuous execution. The user signed off the plan; mid-plan check-ins waste attention. |
| "Spec review came back with 2 issues; I'll fix one, push the other to task 5" | Fix both before moving on. |
| "Production code changed in a strengthen-test-coverage task — it was a tiny refactor" | Reject the output. Production stays clean. |
| "All tasks done; I'll just summarise and finish" | Cross-task review first, then route to sumo-qa-finishing-qa-work. |
User: "Execute the plan." AI (announce): "Dispatching the plan with subagents." AI: 6 tasks; 1–5 parallel, 6 sequential on task 1's fixture. Wave 1: 5 implementers dispatched in one message → spec → quality → done. Task 4 spec-review fails round 1; passes round 2. Wave 2: task 6 dispatches after task 1 commits; two-stage review as before. Final: cross-task reviewer confirms 5 risks covered, suite green. Routes to
sumo-qa-finishing-qa-work.
User: "Execute the plan." AI: edits
tests/billing/test_refund.pydirectly with 3 tasks' tests; runs pytest; reports green. Iron Law violated: no fresh subagents, no spec/quality reviews, no audit trail that tests catch the named risks.
After cross-task review passes → sumo-qa-finishing-qa-work to capture evidence, write the PR-ready summary, and close the loop.
npx claudepluginhub sumithr/sumo-qa --plugin sumo-qaProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.