From qa-test-review
Pure reference catalog of test-step design patterns at the architecture tier - step granularity (one logical action per step), abstraction layers (mechanical → page → business), step extraction rules (when to inline / when to extract to a helper / when to extract to a Page Object method), the declarative-vs-imperative phrasing rule, FIRST principles (Fast / Independent / Repeatable / Self-validating / Timely), and the AAA / Given-When-Then mapping. Distinct from `test-code-conventions` §1 (AAA structure at the file level) and from `manual-step-to-gherkin` (Gherkin-specific translation) - this catalog is the cross-framework architecture-tier reference for what a step IS, when it should exist, and where it should live.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-test-review:test-step-design-patternsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is a **pure reference** - no execution steps. It is the catalog the [`framework-architecture-auditor`](../../agents/framework-architecture-auditor.md), [`test-code-critic`](../../agents/test-code-critic.md), and [`playwright-codegen-reviewer`](../../../qa-web-e2e/agents/playwright-codegen-reviewer.md) cite when auditing step granularity at the architecture tier - within an "Act" phas...
This skill is a pure reference - no execution steps. It is the catalog the framework-architecture-auditor, test-code-critic, and playwright-codegen-reviewer cite when auditing step granularity at the architecture tier - within an "Act" phase, what is one step? It complements test-code-conventions §1 (AAA structure at the file level).
Canonical source: Robert C. "Uncle Bob" Martin - Clean Code (2008), chapter 9 "Unit Tests", reaffirmed in Clean Coder and across his blog. The FIRST mnemonic is the foundational quality bar for every test step.
| Letter | Principle |
|---|---|
| F - Fast | Tests must be fast. Slow tests don't get run; tests that don't get run don't catch bugs. |
| I - Independent | Tests must not depend on each other. Each test sets up its own world. Per Fowler on test isolation, this is the prerequisite to parallel execution and selective re-runs. |
| R - Repeatable | Tests run consistently on any environment (laptop, CI, prod-like). No "runs only on Tuesdays" / "passes on Linux only." |
| S - Self-validating | Tests pass or fail on their own. No human reads logs to determine the outcome. |
| T - Timely | Tests are written close to (ideally before) the production code. Stale tests rot. |
FIRST is the underlying rationale for most of the patterns below. A step that violates FIRST is the smell; the patterns prescribe the fix.
A "step" is the minimal unit a test reader can name in business terms. One logical action per step - not one click, not one assertion, but one meaningful operation.
Each step should do exactly one of: Arrange (setup), Act (the operation under test), Assert (verify outcome), or Annotate (logging / labelling). Steps that mix two are split.
| Smell | Refactor |
|---|---|
await page.click('#submit'); expect(toast).toBeVisible(); | Two steps: the Act (submit) and the Assert (toast visible). Split. |
const user = await createUser({...}); await login(user); | Two Arrange steps. Acceptable as one logical "Arrange a logged-in user" if the helper is named that way (Pattern 4). |
await page.click('#row-1'); await page.click('#row-1-edit'); await page.fill('#name', 'New'); | Three mechanical clicks comprising one business action ("edit row 1's name"). Extract to a single business step (Pattern 4). |
Aim for 3-8 steps per test body (counting Arrange / Act / Assert phases as steps). >15 is the threshold the framework-architecture-auditor flags as a smell - the test is doing too much or operating at the wrong abstraction.
| Anti-pattern | Why it fails |
|---|---|
One step that does everything: await fullCheckoutFlow(); | Single line of action; the test reads as "did the helper work?" not as "did the SUT work?" |
| 30 mechanical clicks comprising one logical flow | Brittle to UI changes; the test reads as a script, not a specification |
Mixed Act + Assert in one line (expect(await page.click(...)).toBeTruthy()) | Cannot distinguish "the action failed" from "the assertion failed" in the diagnostic |
One step with two unrelated assertions (expect(cart.total).toBe(10); expect(user.role).toBe('admin');) | Test fails on the first assert; the second is never evaluated. Split per test-code-critic §2 single-responsibility |
The dominant test-code smell at scale is mechanical steps in the test body. The fix is layered abstraction. Three layers, named consistently:
| Layer | Vocabulary | Example |
|---|---|---|
| Business layer (the test body) | Domain verbs the PM / business stakeholder would recognise | customer.signsIn(), cart.addsItem(sku), checkout.placesOrder() |
| Page / Component layer (Page Objects, Tasks, Service Objects) | Page-specific or component-specific operations | LoginPage.submit({ email, password }), CartPage.applyCoupon(code) |
| Mechanical layer (the framework primitives) | Click, type, navigate, request | page.click(), page.fill(), request.post() |
The test body lives at the business layer. The test never reaches into the mechanical layer directly. If it does, the team has either no abstraction or the abstraction leaks.
Bad (test body at the mechanical layer):
test('places an order', async ({ page }) => {
await page.goto('/login');
await page.fill('#email', '[email protected]');
await page.fill('#password', 'pass');
await page.click('#submit');
await page.goto('/product/sku-001');
await page.click('button.add-to-cart');
await page.goto('/checkout');
await page.fill('#shipping-address', '123 Main St');
await page.click('#place-order');
await expect(page.locator('.confirmation')).toBeVisible();
});
Good (test body at the business layer):
test('places an order', async ({ customer, cart, checkout }) => {
await customer.signsIn();
await cart.addsItem('sku-001');
await checkout.placesOrderWithDefaultShipping();
await expect(checkout.confirmation).toBeVisible();
});
The mechanics live in the Page Object / Task / fixture. The test body reads as the specification, not the implementation.
Some tests legitimately operate at the mechanical layer - testing the mechanical surface itself:
getByRole works across viewports.For these, the test body at the mechanical layer is correct. Tag them (@a11y, @visual) so reviewers don't refactor them by mistake.
When should a step move out of the test body?
| Heuristic | Action |
|---|---|
| The step appears in 3+ tests | Extract to a Page Object method / Task / fixture |
| The step is mechanical (click, fill, navigate) but the test isn't about that mechanic | Extract |
| The step is business-meaningful and only used here | Keep inline; name it well |
| The step is 5+ lines of mechanical operations | Always extract |
| The step requires explanatory comments | The comment is a smell; the abstraction is missing - extract |
| The step is the first thing in 80% of tests | Extract to a fixture (per-test or per-describe setup) |
Per Refactoring (Fowler 1999, 2nd ed. 2018), duplicate code is acceptable at first occurrence (write it inline). At the second occurrence, note the duplication. At the third occurrence, extract.
This applies to test steps: don't pre-extract on the first test ("we might need this later" is YAGNI). Extract when the third test needs the same step.
| Anti-pattern | Why it fails |
|---|---|
| Extracting every step on the first test | YAGNI; the abstraction doesn't match real usage |
| Never extracting (every test is 30 lines of mechanics) | Mechanical leakage; brittle |
Extracting to a helper that doesn't belong to a layer (helpers/random-stuff.ts) | Sprawl; the team can't find the helper |
| Extracted helper that takes 8 parameters | The helper is doing too much; split it |
Helper that hides Act vs Arrange (named setupAndDoThing()) | Test reads as if it's doing one thing when it does two |
Two equivalent step-grouping vocabularies. Same content, different name conventions.
| AAA (xUnit tradition) | Given-When-Then (BDD tradition) |
|---|---|
| Arrange | Given |
| Act | When |
| Assert | Then |
| (Annotate) | (no equivalent; goes in step comments) |
Both express the same three phases. The team picks one and uses it consistently. Don't mix: if some tests use AAA comments and others use Given-When-Then helpers, the cognitive cost grows.
Whatever vocabulary the team uses, the phases must be visually separable. The pattern from test-code-conventions §1:
test('places an order', async () => {
// Arrange
const customer = await aCustomer().withCartItem('sku-001').build();
// Act
const order = await customer.placesOrder();
// Assert
expect(order.status).toBe('confirmed');
});
Or, with blank-line separation (no comments needed):
test('places an order', async () => {
const customer = await aCustomer().withCartItem('sku-001').build();
const order = await customer.placesOrder();
expect(order.status).toBe('confirmed');
});
Either works; lacking either fails test-code-critic §1 AAA review.
Even at the business layer, two phrasings exist:
| Imperative | Declarative |
|---|---|
| "The customer enters the email and password and clicks submit" | "The customer signs in" |
| "The customer adds 5 items to the cart and proceeds to checkout" | "The customer initiates checkout with a 5-item cart" |
| "The customer is created with role=admin and org_id=42" | "An admin customer in org 42" |
The declarative form is preferred per Cucumber's better-Gherkin guidance (which applies broadly, not just to Gherkin): "scenarios should describe the intended behaviour of the system, not the implementation."
The test: would the wording need to change if the implementation changed? If yes, the step is imperative; rewrite declaratively.
| Anti-pattern | Why it fails |
|---|---|
Declarative step that hides a critical mechanic (customer.signsIn() when the test is about SSO redirect) | The test no longer tests what it claims to test |
| Imperative step that re-describes the system internals | Brittle to refactors; the test breaks when the implementation changes for unrelated reasons |
| Mixing imperative and declarative within one test | Reader can't tell what abstraction level they're operating at |
Per Roy Osherove's The Art of Unit Testing (2013), test names follow the pattern <system_under_test>_<scenario>_<expected_outcome>. Step naming (within the test) follows similar discipline:
| Smell | Refactor |
|---|---|
await doThing() | Name what the thing is: await customer.signsIn() |
await test1() | Helpers don't get numeric names; describe what they do |
await x = await getX() | Single-letter variables hide what's being created |
await page.click('#submit') | Wrap in a named Page Object method: await loginPage.submit() |
A reviewer should be able to read the test body aloud and have it sound like a specification:
"A customer signs in. They add SKU-001 to their cart. They place an order with default shipping. The order is confirmed."
If reading aloud doesn't produce a specification - if it produces "click, type, click, click, navigate, click, expect-truthy" - the steps are at the wrong abstraction.
| Scenario | Pattern |
|---|---|
| Test reads as 20 clicks-and-types | Extract to Page Objects / Tasks (Pattern 4) and rewrite at business layer (Pattern 3) |
| Test does Arrange and Act in the same line | Split (Pattern 2 single-purpose rule) |
| Three tests share the same first 4 lines | Extract to fixture / helper (Pattern 4 rule of three) |
| Test phases not visually separable | Add blank lines or AAA comments (Pattern 5) |
| Step name reads as implementation detail | Rewrite declaratively (Pattern 6) |
| Step requires explanatory comment to understand | The abstraction is missing - extract (Pattern 4) |
| Test mixes business and mechanical vocabulary | Pick one layer per test (Pattern 3) |
| Anti-pattern | Why it fails |
|---|---|
| Tests with 30+ steps (one logical thing per "step" but the whole test does 10 logical things) | Per test-code-critic §2, single-responsibility violation; split into multiple tests |
Helpers that wrap one framework call (async function click(sel) { await page.click(sel); }) | Wrapping for the sake of wrapping; adds no abstraction |
Step extracted into a helper but the helper takes a boolean flag that branches behavior | Two helpers masquerading as one |
| Step that does retry / wait / fallback inside | Hides flakiness; the test passes when it should fail |
| Tests that read top-down look fine but the helpers contain hidden assertions | The test seems to assert one thing; actually asserts more (or different things) |
| Tests where the assertion is inside the Page Object method | Violates object-model-patterns no-assertion rule |
framework-architecture-auditor (preloads this skill).test-code-critic.playwright-codegen-reviewer.manual-step-to-gherkin.object-model-patterns (sister catalog).test-isolation-patterns (sister catalog).test-data-patterns (sister catalog).test-code-conventions.<sut>_<scenario>_<expected> naming pattern cited in test-code-conventions §3): ISBN 978-1617290893.Test Method, Assertion Method, Custom Assertion, Inline Resource: ISBN 978-0131495050.test-code-conventions, test-code-critic, framework-architecture-auditor, playwright-codegen-reviewer, manual-step-to-gherkin - the related-tier components.object-model-patterns, test-isolation-patterns, test-data-patterns - sister architecture-tier pattern catalogs.npx claudepluginhub testland/qa --plugin qa-test-reviewProvides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.