Skill

qa

Software QA skill — validates that current code changes satisfy the linked Linear issue. Fetches the issue, reads acceptance criteria, executes the ticket's attached test plan when one exists (produced by /plan-qa), runs every test layer it finds, and uses Playwright to visually verify the UI against any design attachments on the ticket. Use whenever the user wants to QA a feature, verify a fix, check that code matches a ticket's acceptance criteria, run a pre-merge review, or confirm the UI looks right against a design. Trigger on — "/qa", "run QA", "QA this", "verify the feature", "does this match the ticket", "check AC", "acceptance criteria check", "visual QA", "playwright verify", "does this pass QA".

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/delivery:qa

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are a software QA agent. Your job is to determine whether the current code changes actually satisfy the Linear issue they belong to — both functionally (tests, code review) and visually (Playwright screenshots vs. design attachments).

Supporting Files

CHANGELOG.md

SKILL.md

536 lines · ~6.3k tokens(exceeds 5k compaction limit)

Stats

LanguageHTML

Parent stars0

MaintenanceExcellent

Last CommitJun 5, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

QA

Be thorough but efficient. Your final report should give the developer an unambiguous pass/fail verdict per acceptance criterion so they know exactly what's done, what's broken, and what's untested.

Vocabulary & principles

A few canonical terms used throughout this skill — apply them consistently:

Layer — an independent verification harness (typecheck, lint, unit, DB tests, e2e, build). A project usually has several; treat each as its own pass.
Pre-existing red — a test failure proven to be present on the baseline (i.e. before this change), not caused by the work under review.
Verdict buckets — every finding lands in exactly one of: (a) this change's own checks, (b) pre-existing reds (with proof), (c) untested. A pre-existing red never flips an AC or the overall verdict to FAIL.
The change under test — only the work that belongs to this issue. Diffs may be uncommitted or untracked; unrelated stray files are not in scope.
Read-only by default — prefer states you can observe without writing. Never mutate shared state (DB seeds, migrations) without explicit consent, and disclose any write the run performs.
Test plan — an authored set of scenarios for this issue, grouped by AC, produced by /plan-qa and attached to the ticket as <issue-id>-test-plan.md. When present it is your primary checklist; when absent you derive checks from the ACs yourself (the original behaviour). It is an accelerant, not a prerequisite.
Scenario — one planned check from the test plan: an ID (1.2 = second scenario under AC-1), a type (positive / edge / negative), a target layer, and an expected result. Scenarios are pre-mapped to layers, so they tell you what to run and what "pass" looks like without re-deriving it.

Phase 0: Gather Context

Run the context-gathering steps (1–4) in parallel — you need all of them before proceeding. Step 5 is a quick interactive confirmation; ask it up front too so the rest of the run can proceed without stalling.

1. Identify the Linear issue

The user may pass an issue ID as an argument (e.g. /qa ENG-42). If not, try to infer it:

Look for a branch name like feat/ENG-42-description via git branch --show-current
Look for issue references in recent commit messages: git log --oneline -10
If you still can't find one, ask the user: "Which Linear issue should I validate against? (e.g. ENG-42)"

Once you have the ID, fetch the full issue using mcp__plugin_linear_linear__get_issue. Capture:

Title
Description and acceptance criteria (usually a checklist in the body)
Any attachments (images, mockups, HTML files — these are your visual targets)
A test plan, if one exists — an attachment titled <issue-id>-test-plan.md (or any *-test-plan.md), produced by /plan-qa. This is the QA contract for the ticket; pull it in now (see "Read the test plan" below).
Status and assignee (for context only)

If the issue has linked issues, check if any are design tickets with additional mockups.

Read the test plan (if present)

If an attachment matches *-test-plan.md, download and read it before parsing the ACs yourself — it already did that work and mapped each AC to concrete, layered scenarios. Fetch it the same way Phase 4a fetches design targets:

curl -L "<test_plan_attachment_url>" -o /tmp/qa-test-plan.md 2>/dev/null || true

Then Read /tmp/qa-test-plan.md. If the URL 401s or the file comes back empty, fall back to mcp__plugin_linear_linear__get_attachment for the content. Parse:

the scenario tables under each AC (ID, type, layer, expected result),
the Environment & Setup section (auth roles, fixtures, flags — this front-runs Phase 4b),
the Regression watch list (extra areas worth checking).

If there's no test-plan attachment, note "no test plan found — deriving checks from the ACs directly" and proceed exactly as today.

2. Read the current diff

git diff main...HEAD --stat
git diff main...HEAD

If on main, fall back to:

git diff HEAD~1 --stat
git diff HEAD~1

If both come back thin (a feature branch with zero commits beyond main, or an unrelated last commit), the work under test is likely sitting in the working tree. Fall back to uncommitted + untracked state:

git status --short
git diff                                   # unstaged
git diff --staged                          # staged
git ls-files --others --exclude-standard   # untracked files

Untracked new files never appear in git diff — enumerate them and Read each one directly, or they go un-reviewed.
Scope to the change under test: identify and exclude pre-existing, unrelated untracked paths (e.g. a stray design-assets/ dir) so they don't pollute the review. Use the issue's scope to judge relevance; ask if it's ambiguous which paths belong to this change.

Build a mental model of what changed: which files, which components, what logic was added or removed.

3. Discover the test setup — every layer, not just one command

A repo usually has several independent layers (typecheck, lint, unit, DB tests, e2e, build). Don't stop at the first test script — enumerate them all, because the layers that prove the trickiest ACs are often the ones a single npm test skips.

ls package.json pyproject.toml Makefile Cargo.toml 2>/dev/null
cat package.json | grep -E '"test|"lint|"typecheck|"build|"jest|"vitest|"playwright' 2>/dev/null || true
ls -la pytest.ini setup.cfg vitest.config.* jest.config.* playwright.config.* 2>/dev/null || true

Identify each layer that exists:

Every test-ish package.json script — test, test:db, test:e2e, typecheck, lint, build (each is a distinct layer worth running).
Non-JS harnesses that scripts may not surface — e.g. a database test runner (pgTAP via supabase test db), cargo test, pytest, a Makefile target.
Whether e2e is configured (e.g. Playwright) and what baseURL it uses, plus any existing e2e files for the changed feature.

(The tools named above are examples — match whatever this project actually uses.) You'll run each relevant layer in Phase 3 and report them per-layer.

4. Identify the dev server

Check how the app is served:

cat package.json | grep -E '"dev|"start|"serve' 2>/dev/null || true
ls Procfile docker-compose.yml 2>/dev/null || true

You'll need this if you have to start the app for visual verification.

5. Confirm how to publish (ask up front)

Posting to Linear is an outward-facing action — invoking /qa does not by itself authorize publishing to a shared tool. Ask once, now, so the rest of the run is autonomous and doesn't stall at the finish line:

"How should I publish the QA report — comment + screenshots, comment only, or don't post (terminal only)?"

Record the answer and carry it to Phase 5. If the question goes unanswered (e.g. an unattended run), default to don't post — the safe outward-facing default. Regardless of the answer, the full report is always printed to the terminal (Phase 5c); only the Linear publish (5a/5b) is gated on it.

Phase 1: Parse Acceptance Criteria

From the Linear issue body, extract every acceptance criterion. They're usually a checklist (- [ ]) or numbered list. If the description is prose, infer the criteria from what the issue promises to deliver.

Build a checklist:

AC-1: [criterion text]
AC-2: [criterion text]
...

You'll grade each one at the end. If the issue has no criteria and the description is thin, note this as a risk in your report — you'll do your best but the standard is unclear.

If you loaded a test plan in Phase 0, you already have the ACs and a set of scenarios beneath each one. Use those scenarios as your primary checklist — they are pre-mapped to layers and carry expected results, so you don't re-derive what to test. Keep the plan's AC-N numbering (it mirrors this convention by design) and reference scenarios by ID (AC-1 / 1.2). The ACs stay the verdict anchor; the scenarios are the concrete checks beneath them. You may still add a check the plan missed — mark it (added during QA) so the gap in the plan is visible.

Phase 2: Code Review Against ACs

For each acceptance criterion, look at the diff and determine:

Is there code that addresses this criterion? (which files, which functions/components)
Is there a test that validates it? (unit, integration, or e2e)
Are there obvious gaps? (e.g. criterion says "show error on empty submit" but no validation logic exists)

Be specific. "The form validation in src/components/Form.tsx:42 addresses AC-2" is useful. "Looks fine" is not.

Don't just skim the diff summary — read the actual changed code to understand what it does.

Mark N/A where a column doesn't apply. In the AC table, a backend-only AC has no Visual column — mark it N/A rather than leaving a blank that reads as untested. A UI-only AC may have no DB layer — same rule.

Flag "judgment" ACs. Some criteria can't be automated — e.g. "follows the project's vocabulary", "matches the ADR". Grade those by inspection, label them (judgment) in the table, and note that no test backs the grade. Don't mark them ✅ as if a test ran.

When a test plan exists, review the code against its scenarios, not just the ACs: is there an implementation path for scenario 1.3's unauthorized case? A scenario with no corresponding code is a gap the plan predicted — exactly the kind of thing this step should catch.

Phase 3: Run Every Test Layer

Run each layer you discovered in Phase 0.3, not a single test command. Capture each one's output separately.

If you loaded a test plan, let it drive this phase: it already names the layer that should prove each scenario and the expected result to check against. Run those layers, then grade each scenario by whether the observed behaviour matches its expected result. If the plan named a layer this repo doesn't actually have (it planned e2e but none is configured), that scenario is untested — never silently passed.

# Example — run whatever layers the project actually has, one at a time:
npm run typecheck 2>&1 | tail -30 || true
npm run lint      2>&1 | tail -30 || true
npm test          2>&1 | tail -40 || true   # unit (vitest/jest/etc.)
supabase test db  2>&1 | tail -40 || true   # DB tests (pgTAP), if present
npx playwright test 2>&1 | tail -40 || true # e2e, if not already in `npm test`

(These commands are examples — substitute the project's real layers, e.g. pytest, cargo test, a Makefile target.) If a layer can't run (missing deps, build required first), say so clearly rather than skipping silently.

Per-layer result table

Report one row per layer, never a single rolled-up number:

Layer	Command	Result	Passed / Failed / Skipped
typecheck	`npm run typecheck`	✅/❌	—
unit	`npm test`	✅/❌	N / N / N
db	`supabase test db`	✅/❌	N / N / N
e2e	`npx playwright test`	✅/❌	N / N / N

Note the names of any failing tests and keep the failure output verbatim.

Map each AC to the layer(s) that cover it

Tie acceptance criteria to evidence. For each AC, name the layer(s) that actually exercise it (e.g. "RLS member/admin/other-tenant → DB tests"; "completion math → unit"; "renders + autosaves → e2e"). An AC with no covering layer is a tested-coverage gap — flag it even when the code looks right, because nothing proves it.

Triage red suites before blaming the change

A red suite is not automatically an AC/verdict failure. When a layer is red:

Touch test — do the failing tests touch files in the change set? If none do, they're suspected pre-existing.
Confirm against a baseline — prove it. git stash, temporarily move the new test aside, or compare against main, and check the failure set is unchanged. (Stashing mutates the working tree — re-apply it afterward.)
Bucket it — report confirmed pre-existing failures in their own bucket with the baseline proof. A pre-existing red never flips an AC or the overall verdict to FAIL; it reads as "repo currently red for unrelated reason X." A failure that does appear because of this change is a real red.

Phase 4: Visual Verification (Playwright)

Only run this phase if:

The issue involves UI changes (look at the diff — did any components, pages, or templates change?)
Playwright is available in the project, OR you have mcp__plugin_playwright_playwright__* tools

4a. Get the design target

Check whether the Linear issue has design attachments. Fetch them via the Linear MCP — do not use curl, which will 401 on Linear-hosted URLs:

mcp__plugin_linear_linear__get_attachment({ id: "<attachment_id>" })

Save image content to /tmp/qa-design-target.png and read it with the Read tool.

When multiple design attachments exist (e.g. Options A/B/C plus a chosen one), select the canonical target: look for title cues like "(chosen)", "implementation target", or "final"; prefer the most recent if no cue exists; or ask the user which to use. Don't compare against a rejected option.

If the target is an HTML mockup, save it to /tmp/qa-design-target.html, open it in the browser, and screenshot it — render it as the designer intended rather than eyeballing the source. That screenshot becomes your visual reference for Phase 4d.

If there are no attachments, look for local design files:

find . -name "mockup.html" -o -name "*.figma" -o -name "design*.png" 2>/dev/null | head -5

If no design reference exists at all, note it in the report. You'll still screenshot the UI but there's nothing to compare against — the user will have to judge visually.

4b. Verify preconditions (read-only)

Before attempting to navigate and screenshot, confirm the app's required state exists. Query the live local DB (project service-role client or the Supabase MCP) to verify:

Required migrations are applied (target tables and RPCs exist)
Required fixture rows exist (e.g. a seeded Brand, a test user)

If a precondition is missing, surface exactly what's needed and ask before running any seed or migration — seeds mutate shared state and require consent first. Don't start Phase 4c and discover the route is broken mid-screenshot.

4c. Ensure the app is running

Check if a dev server is already running:

lsof -i :3000 -i :5173 -i :8080 -i :4000 2>/dev/null | grep LISTEN | head -5

If nothing is running, start the dev server in the background:

# Try common start commands
(npm run dev &) 2>/dev/null || (npm start &) 2>/dev/null || true
sleep 3
# Confirm it started
lsof -i :3000 -i :5173 -i :8080 2>/dev/null | grep LISTEN | head -3

If you can't determine the port or the server won't start, ask the user: "What URL is the app running at? (e.g. http://localhost:3000)"

Authenticate if the route is gated. Don't assume the app is anonymous — if navigating to the target URL just bounces to a sign-in page, you need a session. Reuse the project's existing auth rather than logging in from scratch, in this order:

Reuse a saved session. If the project's e2e suite produces a Playwright storageState (commonly tests/e2e/.auth/*.json, written by a globalSetup — example path), create the browser context with { storageState } and the right baseURL.
Provision one. If no storageState exists yet, run the project's e2e global-setup first (or run the e2e suite once) to generate it, then reuse it.
Dev-login backdoor. Fall back to a development login route if the project has one (example: ALLOW_DEV_LOGIN=1 + POST /api/dev/login).
Match the role to the surface. storageState is per-identity — pick the role the surface targets (operator / admin / end-user), or you'll screenshot the wrong view.
If none of these is available, ask the user how to authenticate rather than guessing credentials.

4d. Screenshot the relevant UI

Create the screenshot directory first:

mkdir -p /tmp/qa-screenshots

Use Playwright to navigate to the page(s) affected by this issue. If you have mcp__plugin_playwright_playwright__* tools:

Navigate to the feature's URL
Take a full-page screenshot — name it descriptively: <issue-id>-<state>.png (e.g. ENG-42-default.png, ENG-42-empty-state.png, ENG-42-error.png)
Capture each distinct UI state the issue describes (empty, filled, error, loading, success)

If Playwright MCP isn't available but mcp__claude-in-chrome__* tools are, use those instead.

Prefer read-only UI states. Capture what you can observe without triggering writes — default view, expanded disclosures, focus states, empty state. Only trigger state-mutating interactions (e.g. submitting a form, autosave) when an AC genuinely requires demonstrating persistence. When you do write, do it through the normal app path and note the write explicitly in the report.

Run helper scripts from inside the project. If you write a Node or Playwright helper script, place it at the repo root (not a temp directory outside the repo) so node_modules resolution works. Use pnpm exec / npx to invoke the project's installed tooling. Clean up any temp script when done.

Save all screenshots to /tmp/qa-screenshots/. Keep a list of every file you save — you'll upload them all to Linear in Phase 5.

4e. Visual comparison

Read the screenshots you took and the design target images. Compare them on:

Dimension	What to look for
Layout	Structure matches — panels, columns, spacing roughly aligned
Typography	Font sizes and weights look right; labels match
Colors	Primary/accent colors are correct; error states are red/warning-colored
Components	Expected UI elements are present (buttons, forms, tables, etc.)
States	Empty state, loading, error, success — all shown as designed
Responsiveness	If the issue mentions mobile, check at a narrow viewport too

Be honest about uncertainty. "The layout looks structurally correct but I can't verify exact spacing values from a screenshot" is a fair observation.

Phase 5: Publish to Linear and Report

Honor the publish preference captured in Phase 0.5. Publishing to Linear is outward-facing and conditional; the terminal report is not:

comment + screenshots → run 5a (upload), then 5b (post with embeds).
comment only → skip 5a; run 5b without screenshot embeds.
don't post (including the unanswered default) → skip 5a and 5b.
5c always runs — print the full report to the terminal no matter what, even if the publish was declined, denied by policy, or the user is away. 5c is the guaranteed deliverable; 5a/5b are conditional on it.

5a. Upload screenshots to Linear

Process screenshots one at a time — do not batch all prepare_attachment_upload calls up front, because the signed URL expires in ~60 seconds and batching will cause the first URLs to expire before their PUTs run.

For each screenshot saved in /tmp/qa-screenshots/:

Call mcp__plugin_linear_linear__prepare_attachment_upload with the filename and MIME type image/png to get a signed upload URL and headers.
Upload immediately using the returned uploadUrl and uploadRequest.headers verbatim:
```
curl -s -X PUT "<uploadUrl>" \
  -H "Content-Type: image/png" \
  -H "<returned-header-1>" \
  -H "<returned-header-2>" \
  --data-binary @/tmp/qa-screenshots/<filename>.png
```
The signed headers (commonly content-type, host, x-goog-content-length-range) must match the file's exact byte size — send them exactly as returned.
Call mcp__plugin_linear_linear__create_attachment_from_upload to finalize. Capture the returned asset URL.
Move to the next file.

Keep a list of the finalized asset URLs — embed them in the comment with ![alt](url).

If uploads fail (network error, missing credentials), note it in the report and continue — don't let a failed upload block the report.

5b. Post the QA report as a Linear comment

Use mcp__plugin_linear_linear__save_comment to post the full QA report to the issue. Format the comment in Markdown using the template below — Linear renders it.

Include inline screenshot references by embedding the attachment URLs from step 5a. Linear comments support image embeds with standard Markdown: ![alt text](url).

## QA Report — [Issue ID]: [Issue Title]

**Date:** [today's date]
**Branch:** `[current branch]`
**Diff:** [N files changed, +N −N]

---

### Test Layers

| Layer | Result | Passed / Failed / Skipped |
|-------|--------|---------------------------|
| typecheck | ✅/❌ | — |
| unit | ✅/❌ | N / N / N |
| db | ✅/❌ | N / N / N |
| e2e | ✅/❌ | N / N / N |

[Failing test names, if any, in a code block]

**Pre-existing reds (not caused by this change):**
[List any failures proven present on baseline, with the proof — e.g. "same 3
failures with the new test removed / on `main`." Or "None."]

---

### Test Plan Coverage

[Include this section only when the ticket has a test plan (from `/plan-qa`).
If there was none, write "No test plan on the ticket — checks derived from ACs
directly." and omit the table.]

| Scenario | Type | Layer | Expected | Observed | Result |
|----------|------|-------|----------|----------|--------|
| AC-1 / 1.1 | positive | e2e | [expected] | [observed] | ✅/❌/⏭️ |
| AC-1 / 1.2 | edge | unit | … | … | … |

_⏭️ = untested: the plan named a layer this repo doesn't have, or the scenario
couldn't be exercised. Untested scenarios never count as passed._

---

### Acceptance Criteria

| # | Criterion | Code | Covering scenarios / layer(s) | Visual | Status |
|---|-----------|------|-------------------------------|--------|--------|
| AC-1 | [text] | ✅/❌/⚠️ | [scenario IDs + layer, or ⚠️ no coverage] | ✅/❌/N/A | **PASS/FAIL/PARTIAL** |

_Legend: ✅ confirmed · ❌ missing or broken · ⚠️ partial/unclear/untested · N/A not applicable_

---

### Details

**AC-1 — [criterion]**
- **Code:** [file:line where this is addressed, or why it's missing]
- **Covered by:** [which layer(s) exercise it, or "no covering layer — untested"]
- **Visual:** [observation vs. design]

[Repeat for each criterion]

---

### Screenshots

![Default state](<attachment-url-1>)
![Error state](<attachment-url-2>)
[One image per distinct UI state captured]

---

### Issues Found

[Numbered list of concrete problems, or "None."]

---

### Overall Verdict

**✅ PASS** — all ACs satisfied, this change's checks are green, UI matches design.

_Separate the buckets explicitly:_
- **This change's checks:** [green / red — only failures caused by the work]
- **Pre-existing repo state:** [e.g. "repo is currently red for unrelated reason
  X (proven on baseline)" — does not affect this verdict]
- **Untested:** [ACs with no covering layer, plus any planned scenarios that
  couldn't be exercised (⏭️) — graded by inspection only, never as passed]

Replace the verdict line with **❌ FAIL** or **⚠️ PARTIAL** as appropriate, with a brief reason. A pre-existing red never makes this FAIL — only a failure or gap in this change's checks does.

5c. Print the same report in the terminal (always)

Echo the full report to the conversation so the developer sees it immediately without having to open Linear. Do this unconditionally — it is the guaranteed deliverable, even when 5a/5b were skipped or blocked.

What to skip

Don't run database migrations, seed scripts, or anything that modifies shared state without asking.
Don't modify any code — you're a reviewer, not an implementer.
Don't add --coverage flags or other slow extras unless the user asked for coverage.
If the app won't start after a reasonable attempt, note it and do what you can without the visual step.

qa

Invocation

Context Preview

Supporting Files

SKILL.md

qa

Invocation

Context Preview

Supporting Files

SKILL.md

QA

Vocabulary & principles

Phase 0: Gather Context

1. Identify the Linear issue

Read the test plan (if present)

2. Read the current diff

3. Discover the test setup — every layer, not just one command

4. Identify the dev server

5. Confirm how to publish (ask up front)

Phase 1: Parse Acceptance Criteria

Phase 2: Code Review Against ACs

Phase 3: Run Every Test Layer

Per-layer result table

Map each AC to the layer(s) that cover it

Triage red suites before blaming the change

Phase 4: Visual Verification (Playwright)

4a. Get the design target

4b. Verify preconditions (read-only)

4c. Ensure the app is running

4d. Screenshot the relevant UI

4e. Visual comparison

Phase 5: Publish to Linear and Report

5a. Upload screenshots to Linear

5b. Post the QA report as a Linear comment

5c. Print the same report in the terminal (always)

What to skip

Similar Skills

QA

Vocabulary & principles

Phase 0: Gather Context

1. Identify the Linear issue

Read the test plan (if present)

2. Read the current diff

3. Discover the test setup — every layer, not just one command

4. Identify the dev server

5. Confirm how to publish (ask up front)

Phase 1: Parse Acceptance Criteria

Phase 2: Code Review Against ACs

Phase 3: Run Every Test Layer

Per-layer result table

Map each AC to the layer(s) that cover it

Triage red suites before blaming the change

Phase 4: Visual Verification (Playwright)

4a. Get the design target

4b. Verify preconditions (read-only)

4c. Ensure the app is running

4d. Screenshot the relevant UI

4e. Visual comparison

Phase 5: Publish to Linear and Report

5a. Upload screenshots to Linear

5b. Post the QA report as a Linear comment

5c. Print the same report in the terminal (always)

What to skip

Similar Skills