From Flagrare
Goal-driven smoke test for just-implemented features. Drives browser or backend, tests acceptance criteria and exploratory edges, catches errors, fixes gaps, captures permanent test.
How this skill is triggered — by the user, by Claude, or both
Slash command
/flagrare:smoke-testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A goal-driven validation pass for the feature you just implemented. The pass ends only when every scenario — both acceptance-criteria-defined and exploratory — passes against a real running instance, every gap or bug found has been fixed, and the working trajectory has been captured as a permanent test.
A goal-driven validation pass for the feature you just implemented. The pass ends only when every scenario — both acceptance-criteria-defined and exploratory — passes against a real running instance, every gap or bug found has been fixed, and the working trajectory has been captured as a permanent test.
The word "smoke" is doing real work here: this is not a full regression suite. It is the shortest path that exercises the new behaviour end-to-end against a real running system. If it can't be done in under ten minutes, the scope is wrong — split the feature, not the test.
Implementation finishing and the feature working are two different events that teams routinely conflate. Tests pass, types check, lint is clean — and the feature is still broken in production because nobody opened the actual app or hit the actual endpoint. Static checks measure code, not behaviour. This skill closes that gap.
There is a second reason. The model that writes the implementation also writes its own test discipline. Without an external loop that exercises the running system, defects that live between units — exactly the defects integration tests are supposed to catch but rarely do completely — ship straight to review.
Before any action, state the goal in one sentence. The goal owns this entire flow; the agent does not exit until the goal is met.
Goal: validate that [feature name / ticket key] works end-to-end against a running instance. Every acceptance criterion passes, every exploratory edge passes, every gap or bug found is fixed before exit, and the successful trajectory is captured as a permanent test.
Surface the goal back to the user in plain prose so they can correct scope before the loop starts.
Inspect the staged diff (git diff --staged --name-only) and the recent context (intake brief if present, last few commits if not) to pick the domain.
| Signal | Domain |
|---|---|
Diff touches .tsx/.jsx/.vue/.svelte/.css/.scss/component dirs, no backend handlers | ui |
| Diff touches API handlers / controllers / route files / DB migrations / worker code, no frontend files | backend |
| Diff touches both | both |
| Ambiguous | ask via AskUserQuestion with the three options |
Load the matching reference file(s):
ui → read references/ui.mdbackend → read references/backend.mdboth → read both. The priority order interleaves: do P0–P1 of each domain in parallel, then P2 of each, and so on, so a broken backend doesn't block UI validation and vice versa.Skip the skill entirely when the diff is purely process / config / docs and no user-observable behaviour changed — say so explicitly rather than running an empty loop.
This is the part where AI assistance pays off most. Draft the scenario list once, then freeze it as something the loop executes deterministically.
Gather inputs:
## Acceptance Criteria section), the ticket, the PR description, or directly from the userEmit a structured scenario list. Each scenario has:
S1, S2, …)Show the list to the user via AskUserQuestion with options to Run all (Recommended) / Edit scope first / Cancel. Don't start driving the browser or hitting the API before scope is confirmed — a smoke test against a wrong scope is just noise.
Walk the scenarios in priority order, lowest tier first (P0 catches dead-on-arrival, P4 is nice-to-have). The reference files define each tier in detail; this section describes the loop, not the content.
For each scenario:
pass, fail, or blocked (depends on a prior scenario failing).If P0 fails, stop the cascade — fix P0 first, then restart from P0. P0 is "did the feature even load?" — if not, everything else is noise. P1 onwards can collect failures and triage them together.
No retries on flake. A scenario that passes on second attempt without an intervening change is a defect, not a config setting. Investigate. Race conditions, hydration timing, and connection-pool warmup all hide behind retries.
For every fail, fix it before exit. Not "log it for later." Not "tracked in the PR description." Fixed.
Loop:
references/ui.md).blocked by it.pass.The loop ends only when the scenario list is all green. If a fix takes the work out of scope (a defect not caused by this PR), surface it to the user via AskUserQuestion: Fix in scope / Defer with a follow-up ticket / Block this PR until resolved. The user owns the call; the skill must not silently defer.
A successful smoke pass is wasted unless the next person can re-run it without re-driving the browser by hand. Before declaring the goal met, convert the trajectory into a runnable test:
references/ui.md). One file per feature, named for the feature. Use getByRole / getByLabel selectors — never CSS classes or XPaths. Include the same scenarios at the same priorities; the spec becomes the regression test.references/backend.md).If the project already has a smoke-test suite, add to it. If not, create the suite in the conventional location (tests/smoke/, e2e/, or the framework's default).
This step exists because a smoke test that only the agent can run is a regression that will land within weeks. Codifying it makes the next person's work cheaper, not just yours.
Once every scenario passes and the trajectory is captured, surface the final state to the user via AskUserQuestion:
/flagrare:wrap-up (Recommended): the feature works end-to-end, every gap was closed, the regression test landed.Never declare done by prose. The button-prompt is the audit trail.
AskUserQuestion. Same UX contract as plan-mode's accept tool: a button, not a typing prompt.[implementation]
↓
/flagrare:figma-matcher (only when UI; verify visual against Figma)
↓
/flagrare:smoke-test (this skill — behavioural validation against a running instance)
↓
/flagrare:wrap-up (static quality: tests, lint, types, /flagrare:implementation-review)
↓
/flagrare:staleness-audit
↓
git commit
↓
/flagrare:open-pr
↓
/flagrare:release-check
When the feature is backend-only, skip the figma-matcher step and start at smoke-test. When it's UI-only, smoke-test runs without the backend branch of the priority loop. Full-stack features run both branches interleaved (see Step 2).
references/ui.md — UI priority order (P0 preconditions, P1 acceptance criteria, P2 cross-cutting quality gates, P3 exploratory, P4 nice-to-have), Playwright MCP + Chrome DevTools MCP split, axe + keyboard walkreferences/backend.md — backend priority order (reachability → happy path → contract → auth matrix → error shape → boundaries → idempotency → observability → edges → latency), RFC 9457 Problem Details, observability-driven testing, 403/404 tenant-leak checkdocs/research/2026-05-23-ui-smoke-test-best-practices.md — sourced findings the UI reference is built ondocs/research/2026-05-23-backend-smoke-test-best-practices.md — sourced findings the backend reference is built onnpx claudepluginhub flagrare/agent-skills --plugin flagrareProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.