From QA My App
Run end-to-end tests for some or all pages in this app. Asks the user which routes/tasks to cover (or accepts a filter argument), builds a persistent task-queue index for the run, dispatches batches of test-runner subagents in parallel, verifies each returned result.md against the schema, retries verification failures once, then writes a final cross-task report. Use after a sprint, after a dependency bump, or before a release.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-catalog:run-allWhen to use
Use to run the full test suite, perform a regression test across all routes, or re-run only failed or changed tasks before a release. Trigger phrases include "run all tests", "full regression", "run the full test suite", "test everything", "run qa", "run all qa tests", "regression before release", and "run failed tests".
This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Catalog present: !`test -f QA-tests/catalog.json && echo YES || echo NO`
test -f QA-tests/catalog.json && echo YES || echo NOls QA-tests/tasks/T*.md 2>/dev/null | wc -lls -t QA-tests/results/runs 2>/dev/null | head -3$ARGUMENTS| Setting | Value |
|---|---|
| Dev URL | ${user_config.dev_url} |
| Parallel test runners | ${user_config.parallel_test_runners} |
| Browser channel | ${user_config.browser_channel} |
| Headless | ${user_config.browser_headless} |
| Settle ms | ${user_config.settle_ms} |
| Auth mode | ${user_config.auth_mode} |
If Catalog present is NO, stop and tell the user to run /qa-catalog:init first.
This skill is the supervisor. It owns the task queue, hands work off to qa-test-runner (project-level, installed by /qa-catalog:init Phase 0) subagents, verifies each returned result, and re-dispatches until the queue is empty. Subagents are stateless workers — they get a task, they execute, they write result.md + screenshots, they return one JSON line, they're gone. The supervisor is the only thing that knows the run as a whole.
Read QA-tests/catalog.json and QA-tests/tasks/T*.md to build the full universe [{taskId, taskFile, route}].
If $ARGUMENTS is non-empty, resolve it directly — no prompt:
T01,T03,T07 or T01 T03 T07 → exactly those task ids./customers) → every task whose catalog entry's route starts with it.failed → tasks whose most recent entry in QA-tests/results/history.json is FAIL or BLOCKED.changed → tasks whose route source is dirty per node "${CLAUDE_PLUGIN_ROOT}/scripts/catalog-diff.mjs" --json.If $ARGUMENTS is empty, ask the user one AskUserQuestion with the catalog's routes as a multi-select:
Which routes do you want to test? (uncheck any you want to skip)
/— 2 tasks/customers— 4 tasks/customers/:id— 3 tasks/orders— 5 tasks- … (one row per route in catalog.json)
- Re-run only failing tasks from last run
- Re-run only tasks whose source changed
Default every route to checked. If the user picks one of the two bottom options, ignore the per-route selection and apply the corresponding filter instead.
Build the work list tasks = [{ taskId, taskFile, route, status: "pending", attempt: 0 }].
If the work list is empty, stop and tell the user nothing matched.
Resolve per-role credentials once for the whole run. When auth_mode is per-role, load the credential map (passwords interpolated from env vars) a single time:
node "${CLAUDE_PLUGIN_ROOT}/scripts/auth-resolve.mjs" --json
Hold the returned roles object as credentialsByRole for the dispatch loop. If the file is absent (present: false) or auth_mode is not per-role, leave credentialsByRole unset and every runner falls back to the single shared credential. Surface any role with resolved: false to the user once here (e.g. "role admin has no QA_CRED_ADMIN_PASSWORD set — its tasks will be BLOCKED") so they can fix it before the run finishes. Never print resolved passwords.
runId = <UTC ISO-8601 second precision, ':' replaced with '-'> — e.g. 2026-05-28T14-22-11Z.
runRoot = QA-tests/results/runs/<runId>.
Create <runRoot>/ and one subfolder per task: <runRoot>/<taskId>/.
Write <runRoot>/run.json with run metadata (immutable for the rest of the run):
{
"runId": "<runId>",
"startedAt": "<ISO-8601>",
"filter": "$ARGUMENTS or 'interactive'",
"selectedRoutes": ["/customers", "/orders"],
"settingsSnapshot": { /* the user_config values used */ },
"taskCount": <N>
}
Write <runRoot>/task-queue.json — this is the persistent index that drives the loop. Re-read and re-write it after every state transition so the run is resumable and auditable. After every save of task-queue.json, also run:
node "${CLAUDE_PLUGIN_ROOT}/scripts/render-report.mjs" "<runRoot>"
This refreshes <runRoot>/report.html — a self-contained, auto-refreshing HTML dashboard the user can keep open in a browser tab to watch progress live (meta-refresh every 3s while the run is in flight; refresh disables itself once the run is complete).
{
"runId": "<runId>",
"updatedAt": "<ISO-8601>",
"tasks": [
{
"taskId": "T01-customers-list",
"taskFile": "QA-tests/tasks/T01-customers-list.md",
"route": "/customers",
"status": "pending", // pending | dispatched | verifying | complete | failed-verification
"attempt": 0,
"startedAt": null,
"finishedAt": null,
"verdict": null, // PASS | FAIL | BLOCKED (after verification)
"tcCount": null,
"passCount": null,
"failCount": null,
"blockedCount": null,
"screenshots": null,
"consoleErrors": null,
"networkFailures": null,
"defects": [],
"verificationIssues": [] // filled if verify-result.mjs rejected the runner's output
}
]
}
Render the initial dashboard so the user can open it before the first task lands:
node "${CLAUDE_PLUGIN_ROOT}/scripts/render-report.mjs" "<runRoot>"
Print the absolute path to <runRoot>/report.html to the console so the user can open it. On Windows, you may additionally invoke start "" "<runRoot>/report.html" to auto-open it; on macOS, open; on Linux, xdg-open. Skip the auto-open if you're unsure of the platform.
The loop runs until every task in the queue has status ∈ { complete, failed-verification }. Hold at most ${user_config.parallel_test_runners} runners in flight at any moment.
Repeat:
Pick work. Re-read task-queue.json. Collect tasks with status === "pending". If none AND nothing is in flight → exit the loop. Otherwise take up to parallel_test_runners − inFlight of them.
Mark dispatched. For each picked task:
status = "dispatched", attempt += 1, startedAt = <ISO-8601>.task-queue.json.Hand off. Spawn the project-level agent qa-test-runner for each picked task, in parallel within the batch. Each spawn gets its own Playwright process — no context coordination needed. Payload:
{
"taskId": "<id>",
"taskFile": "<repo-relative path>",
"runDir": "<repo-relative runRoot>/<taskId>",
"runId": "<runId>",
"devUrl": "${user_config.dev_url}",
"settings": {
"browserChannel": "${user_config.browser_channel}",
"headless": ${user_config.browser_headless},
"settleMs": ${user_config.settle_ms},
"authMode": "${user_config.auth_mode}",
"defaultRole": "${user_config.default_role}",
"credentials": { "username": "${user_config.auth_username}", "password": "${user_config.auth_password}" },
"credentialsByRole": { ...the resolved roles object from Phase 0, or omit if not per-role... },
"storageStatePath": "${user_config.auth_storage_state_path}"
}
}
Wait for the batch to report back. Each runner returns one JSON line with { taskId, claimedResult, screenshotsClaimed } and has written result.md + screenshots into <runDir>/<taskId>/.
Verify mechanically. For each returned task, run:
node "${CLAUDE_PLUGIN_ROOT}/scripts/verify-result.mjs" "<runRoot>/<taskId>"
The script returns a single-line JSON { taskId, valid, issues, parsed } (exit 0/1) and is the authority on schema validity (header fields, PASS|FAIL|BLOCKED verdicts, screenshot resolution, defect ids). Trust its valid flag and issues — do not re-validate result.md yourself.
Update the queue:
status = "complete", copy parsed.{result → verdict, tcCount, passCount, failCount, blockedCount, screenshotsOnDisk → screenshots, consoleErrors, networkFailures, defects} onto the task entry, finishedAt = <ISO-8601>.attempt < 2 → set status = "pending", record verificationIssues = issues. The loop will pick it back up and re-dispatch with a fresh contextId.attempt === 2 → set status = "failed-verification", verdict = "BLOCKED", persist verificationIssues. The runner produced unparseable output twice — that's a real defect and gets surfaced in the summary.task-queue.json after every transition, and immediately re-run:
node "${CLAUDE_PLUGIN_ROOT}/scripts/render-report.mjs" "<runRoot>"
so the open browser tab picks up the new state on its next auto-refresh tick.Loop.
Once the loop exits, derive the summary from task-queue.json only (single source of truth). Write <runRoot>/summary.md following the schema we agreed on for results — header table + per-task table + a verification-failures block:
# QA My App run — <runId>
| Field | Value |
|---|---|
| Started (UTC) | <ISO-8601> |
| Finished (UTC) | <ISO-8601> |
| Duration | <hh:mm:ss> |
| Filter | <argument or "interactive: routes /customers,/orders"> |
| Tasks executed | <N> |
| Pass | <X> |
| Fail | <Y> |
| Blocked | <Z> |
| Failed verification | <V> |
| Total screenshots | <S> |
| Console errors | <C> |
| Network failures | <NF> |
| Parallel runners | <user_config.parallel_test_runners> |
| Browser | <channel> / headless=<bool> |
## Task results
| Task | Route | Result | Attempts | Duration | Screenshots | Errors | Details |
|---|---|---|---|---|---|---|---|
| T01-customers-list | /customers | PASS | 1 | 22s | 5 | 0 | [result](T01-customers-list/result.md) |
| T03-customers-edit | /customers/:id | FAIL | 1 | 41s | 8 | 1 | [result](T03-customers-edit/result.md) — DEF-bad-validation |
| T07-orders-bulk | /orders | BLOCKED (verification) | 2 | — | — | — | [result](T07-orders-bulk/result.md) — schema issues: missing TC-04 verdict |
## Defects Found
- DEF-bad-validation — T03-customers-edit
- … (deduped across the run)
## Verification failures
- T07-orders-bulk (attempt 1): no `### TC-NN ... — PASS|FAIL|BLOCKED` headers found
- T07-orders-bulk (attempt 2): screenshot file(s) missing on disk: TC03-confirm.png
Run, in order:
node "${CLAUDE_PLUGIN_ROOT}/scripts/results-index.mjs" append "<runRoot>"
node "${CLAUDE_PLUGIN_ROOT}/scripts/render-report.mjs" "<runRoot>"
The first updates QA-tests/results/history.json, QA-tests/results/latest.json, and QA-tests/results/by-task/<taskId>/latest.json. The second writes a final, non-refreshing <runRoot>/report.html (with the "live" dot dimmed, auto-refresh disabled, and the total run duration in the header) so the dashboard becomes the canonical browse view for the run.
QA My App run-all <runId>
selection: <interactive routes | filter arg>
tasks: <N> pass: <X> fail: <Y> blocked: <Z> verif-failed: <V>
results: QA-tests/results/runs/<runId>/summary.md
report: QA-tests/results/runs/<runId>/report.html ← open in a browser
queue: QA-tests/results/runs/<runId>/task-queue.json
defects: <list of DEF-* ids> (or "none")
qa-test-runner subagents.task-queue.json is the source of truth during the run. Read it before every dispatch, write it after every state transition. The summary at the end is derived from it.parallel_test_runners. Do NOT serialize unless the work list has exactly one task.BLOCKED immediately — the runner couldn't even start).verify-result.mjs says so. The runner's self-report is advisory, not authoritative.QA-tests/tasks/*.md during a run — those are the input contract.task-queue.json belongs to its own run dir; history.json is append-only.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub elwizard33/qa-my-app --plugin qa-catalog