From qa-test-review
Pure-reference catalog of test code conventions - AAA structure (Arrange / Act / Assert), per-test single-responsibility, descriptive naming patterns (`<system_under_test>_<scenario>_<expected>` vs nested describe), assertion specificity, fixture-coupling rules, and the magic-number / hard-coded-string anti-pattern. The agents in this plugin (test-code-critic, assertion-quality-reviewer, mocking-anti-pattern-detector, e2e-selector-quality-critic) load this as their shared rule book. Use as a team's onboarding reference for "what makes a test code-reviewable" and as the source-of-truth the critics' verdicts cite back to.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-test-review:test-code-conventionsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is a **pure reference** - no actions, no workflows. It
This skill is a pure reference - no actions, no workflows. It
catalogs the test-code conventions that the rest of qa-test-review's
critics enforce. When a critic flags an issue, it cites this
reference for the reviewer to learn the underlying rule.
test-code-critic,
assertion-quality-reviewer, mocking-anti-pattern-detector,
e2e-selector-quality-critic) flagged an issue and the reviewer
needs the underlying rule's rationale.The canonical test shape - Arrange, Act, Assert - splits each test into three phases:
test('addItem increases cart count', () => {
// Arrange
const cart = new Cart();
// Act
cart.addItem({ sku: 'BOOK-001', qty: 1 });
// Assert
expect(cart.itemCount).toBe(1);
});
The phases are visually separated (blank line; comment; or
// arrange / act / assert headers). The benefits:
Each test has exactly one Act. Two Acts = two tests. Splitting on multiple Acts is the canonical refactor when a single test grows too long.
A test that asserts cart.count === 1 AND cart.totalPrice === 10 AND cart.lastUpdated > now() is three tests in disguise. When one
assertion fails, the test stops; the other two failures are
masked.
The rule: one logical assertion per test. "Logical" means
related to the same observable property - multiple expect calls
that all verify "the cart has one item" (count = 1, items.length = 1,
contents[0].sku = '...') are one logical assertion.
Splitting:
// Bad — three logical assertions
test('addItem updates cart', () => {
cart.addItem(...);
expect(cart.itemCount).toBe(1); // assertion 1
expect(cart.totalPrice).toBe(10); // assertion 2 (different property)
expect(cart.lastUpdated).toBeGreaterThan(t0); // assertion 3 (different property)
});
// Good — three tests, each assertion isolated
test('addItem increments count', () => { /* ... */ });
test('addItem updates totalPrice', () => { /* ... */ });
test('addItem updates lastUpdated', () => { /* ... */ });
Two well-established conventions:
<system_under_test>_<scenario>_<expected> (Roy Osherove)test('addItem_validQty_incrementsCount', () => { /* ... */ });
test('addItem_zeroQty_throwsValidationError', () => { /* ... */ });
test('addItem_negativeQty_throwsValidationError', () => { /* ... */ });
The triple makes the test self-documenting; no need to read the body to understand what's verified.
describe('Cart', () => {
describe('addItem', () => {
describe('with valid qty', () => {
it('increments count', () => { /* ... */ });
});
describe('with zero qty', () => {
it('throws ValidationError', () => { /* ... */ });
});
});
});
Both are valid; the team picks one and stays consistent. Mixing the two in one suite is the smell.
Avoid: test('it works'), test('test 1'),
test('addItem 1'), test('addItem 2'). Generic names are zero
debugging help when the failure surfaces.
Assertions should narrow the verification window to exactly what's being asserted. Vague matchers hide regressions:
| Vague | Specific | Why specific is better |
|---|---|---|
expect(x).toBeTruthy() | expect(x).toEqual({ id: 1, name: 'foo' }) | "truthy" passes for 1, 'a', {}, [] - many bug shapes pass. |
expect(arr).toBeDefined() | expect(arr).toEqual(['BOOK-001']) | "defined" passes for [], null-prototype, … |
expect(err).toBeInstanceOf(Error) | expect(err.code).toBe('VALIDATION_ERROR') | Catches "right type, wrong reason" regressions. |
expect(response.status).toBeGreaterThan(199) | expect(response.status).toBe(201) | 200, 204, 299 also pass - masks status-code regressions. |
expect(html).toContain('error') | expect(html).toMatch(/<div class="error">.*Invalid/) | "error" matches "no errors" too. |
The general principle: the assertion should fail on any change to the SUT's behavior that isn't intentional. If the assertion passes for behaviors that shouldn't, it's too loose.
The full taxonomy per mocks-stubs:
"Dummy objects: passed around but never actually used. Usually they are just used to fill parameter lists." (mocks-stubs)
"Fake objects: actually have working implementations, but usually take some shortcut which makes them not suitable for production." (mocks-stubs)
"Stubs: provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in." (mocks-stubs)
"Spies: stubs that also record some information based on how they were called." (mocks-stubs)
"Mocks: objects pre-programmed with expectations which form a specification of the calls they are expected to receive." (mocks-stubs)
Per mocks-stubs: "Only mocks insist upon behavior verification. The other doubles can, and usually do, use state verification."
Convention rules:
Tests that share fixtures (parameters, builders, factory functions) should be coupled at the smallest scope:
| Scope | When to use |
|---|---|
| Inline (per test) | Default. The test owns the data; reviewer sees what's being tested. |
describe-block | Multiple tests verify the same scenario differently. |
File-level (beforeEach) | Shared setup with no per-test variation. |
| Cross-file factory | Shared shapes, not shared instances. Use builders / factories. |
Anti-pattern: a giant globalFixtures.ts that every test imports.
Tests now break in unrelated ways when the global is touched; the
test no longer "owns" what it verifies.
Test code is more tolerant of magic numbers than production code - the test's job is often to assert against specific values. But meaningful magic matters:
// Bad
expect(cart.totalPrice).toBe(43.21); // why 43.21?
// Better
const PRICE_PER_BOOK = 10.99;
const QTY = 4;
const EXPECTED_TAX_RATE = 0.0825;
const EXPECTED_TOTAL = PRICE_PER_BOOK * QTY * (1 + EXPECTED_TAX_RATE);
expect(cart.totalPrice).toBeCloseTo(EXPECTED_TOTAL, 2);
The named constants double as documentation. The reviewer sees the math; the assertion failure message becomes interpretable.
Per pw-best-practices: "Your DOM can easily change so having your tests depend on your DOM structure can lead to failing tests."
Per tl-queries (Testing Library priority order):
| Priority | Query | When to use |
|---|---|---|
| 1 | getByRole('button', { name: 'Submit' }) | Default. Tests via the accessibility tree - same path as users. |
| 2 | getByLabelText('Email') | Form fields with associated <label>. |
| 3 | getByPlaceholderText, getByText, getByDisplayValue | When labels aren't available. |
| 4 | getByAltText, getByTitle | Images, tooltips. |
| 5 | getByTestId('submit-button') | Last resort. Per tl-queries: "The user cannot see (or hear) these." |
CSS class selectors (.button-primary) and XPath
(//div[@class='cart']//button[1]) are not on the priority list -
per pw-best-practices they are explicitly identified as
brittle.
The same convention holds for Playwright (page.getByRole(...)),
Cypress (cy.findByRole(...) via cypress-testing-library), and
Selenium (when the test framework supports role-based queries via
extensions).
Per pw-best-practices: avoid "manual assertions without
waiting. Using isVisible() checks immediately without awaiting,
rather than web-first assertions like toBeVisible() that wait for
conditions to be met."
// Bad — race condition
expect(page.locator('.toast').isVisible()).toBe(true);
// Good — auto-waits
await expect(page.locator('.toast')).toBeVisible();
The web-first form auto-waits for the assertion to become true within the test's timeout, eliminating the wait-N-seconds-and-hope pattern.
A test that takes >1s in setup (creating fixtures, seeding DB, warming caches) has a coupling problem. The remedies:
beforeAll if shared.db-snapshot-restore
template-DB pattern for DB tests instead of db:reset.The whole-suite cost of slow setup compounds: 10 tests × 2s = +20s per CI run × 50 PRs/day = 1000s/day burned.
test-code-critic,
assertion-quality-reviewer,
mocking-anti-pattern-detector,
e2e-selector-quality-critic - the four agents that consume this reference.Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub testland/qa --plugin qa-test-review