From qa-flake-triage
Reference catalog of flake patterns - async/timing, test ordering, shared parallel state, resource leaks, network, locator drift, environment variance, randomness - with detection heuristics and remediation per pattern. Use when triaging an unknown flake to identify the category before bisecting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-flake-triage:flake-pattern-referenceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Terminology note:** "flaky test" is a practitioner-emergent term
Terminology note: "flaky test" is a practitioner-emergent term popularized by the Google Testing Blog (google-causes, google-flaky); ISTQB does not maintain a canonical entry. This catalog reflects industry-engineering consensus, not ISTQB authority.
A flake is rarely random - it almost always falls into one of eight
recurring patterns. Identifying the pattern early shrinks the bisect
search space dramatically. This catalog is a reference, not a
workflow; the matching workflow is in
flaky-test-quarantine, and the
agent that drives a structured bisect is
e2e-flake-bisector.
The Google Testing Blog observed a near-linear correlation between test size and flakiness rate across ~4.2M tests (google-causes) - larger tests touch more of the eight patterns at once.
The most common flake category in UI and integration tests.
| Signal | What's happening |
|---|---|
| Fails ~5 - 20% of runs; passes when the machine is faster | Test waits for an arbitrary setTimeout(N) instead of a deterministic event. |
| Fails on CI but never locally | CI runners have different cold-start timings than dev laptops. |
| Fails after a dependency upgrade with no test code change | Library's internal timing changed (e.g. Playwright auto-wait window). |
Remediation:
await expect(loc).toBeVisible(), page.waitForLoadState('networkidle'),
page.waitForFunction(...), etc.animations: 'disabled' in Playwright; Cypress.config('animationDistanceThreshold', 0) in Cypress).sinon.useFakeTimers() / vi.useFakeTimers() / Playwright's
page.clock.install().Tests pass alone, fail when run with siblings.
| Signal | What's happening |
|---|---|
npm test -- --testNamePattern='^X$' passes; full run fails | Test relies on state from a previously-run test. |
| Adding a new test breaks an unrelated existing one | Implicit ordering dependency exposed by the new test pushing the old test into a different position. |
Random-order test runners (Jest randomize) flag the suite | Suite is order-dependent. |
Remediation:
jest --randomize, pytest --random-order,
mocha --sort reverse).beforeEach / afterEach, never rely on
beforeAll for state that the test mutates.Tests pass sequentially, fail when run in parallel workers.
| Signal | What's happening |
|---|---|
Fails ~50% of runs in CI matrix; passes locally with -j 1 | Two workers writing to the same DB row / file / port. |
| Fails more often as worker count goes up | Linear shared-state contention. |
| Error message mentions "duplicate key" / "address in use" / "file already exists" | Direct collision evidence. |
Remediation:
parallel-isolation-checker
agent to find shared state.PG_SCHEMA=test_${WORKER}),
per-worker temp dirs (TMPDIR=/tmp/test-${WORKER}), per-worker port
ranges.Tests pass on a fresh machine, fail after the test process has run for hours.
| Signal | What's happening |
|---|---|
| Fails increasingly often as suite duration grows | Memory or file-descriptor leak in the test setup. |
EMFILE / EADDRINUSE errors mid-suite | File-descriptor or port exhaustion. |
| Long-running processes (Playwright browsers, Cypress runners) crash mid-suite | Process accumulating zombies. |
Remediation:
await browser.close() / await server.close() in afterAll,
with a try/finally so failed tests still clean up.--testTimeout, test.setTimeout()).lsof | wc -l and ps aux | wc -l before / after the suite in
CI to detect leaks; alert when growth exceeds a threshold.Tests pass when the upstream is healthy, fail otherwise.
| Signal | What's happening |
|---|---|
| Fails on the same handful of tests that hit the same external URL | Real network call to a flaky third party. |
| Fails right after a deploy of a non-test service | Test is hitting prod / staging of a sibling service. |
ETIMEDOUT / ECONNRESET in error logs | Network-layer error, not test-logic error. |
Remediation:
page.route().UI tests pass when the page looks one way, fail when it shifts.
| Signal | What's happening |
|---|---|
| Fails after an unrelated CSS change | Selector matched by position rather than identity. |
selector matched 2 elements errors | Ambiguous selector now matches more than one node. |
| Fails only at certain viewports | Layout shifts cause mobile / desktop selectors to differ. |
Remediation:
page.getByRole('button', { name: 'Submit' })),
then data-testid, only text= / CSS as a last resort.strict: true so any ambiguous selector
fails immediately rather than silently picking the first match.responsive-breakpoint-runner;
visual signal exposes layout-shift flakes faster than text checks.Tests pass on Linux CI, fail on macOS dev machines (or vice versa).
| Signal | What's happening |
|---|---|
| Fails only on a specific CI runner / OS | OS-specific path separator, line ending, or filesystem case sensitivity. |
| Snapshot tests fail with sub-pixel diffs across OS | OS font / anti-aliasing differences (see playwright-snapshots). |
Fails in tz configurations not set to UTC | Timezone-sensitive assertion. |
Remediation:
TZ=UTC) for deterministic runs.playwright-snapshots).path.posix.join() /
node:path.Tests use random data without a controlled seed.
| Signal | What's happening |
|---|---|
| Failures don't reproduce on retry | Test data was randomized; the failing combination is gone. |
| Test asserts a property that holds "almost always" | Property-based test exposing a real edge case (this is good - fix the production bug). |
| Faker-generated data triggers a layout overflow | Random string longer than the assertion expected. |
Remediation:
Math.random via seedrandom, faker via
faker.seed(N), property-based testing via fc.assert(prop, { seed }).bug-repro-builder).Test fails ~50% of runs?
├── Yes → likely "shared parallel state" or "test ordering"
└── No → fails ~5–20% of runs?
├── Yes → likely "async/timing" or "network"
└── No → fails only on specific OS / runner?
├── Yes → "environment variance"
└── No → fails after long suite duration?
├── Yes → "resource leaks"
└── No → fails after unrelated UI change?
├── Yes → "locator drift"
└── No → does the test use random data?
├── Yes → "randomness"
└── No → bisect with `e2e-flake-bisector`
For systematic bisection, hand the test off to the
e2e-flake-bisector agent,
which varies one axis at a time per the patterns above.
flaky-test-quarantine -
workflow that uses this catalog during triage.e2e-flake-bisector,
parallel-isolation-checker,
regression-bisector -
agents that implement the per-pattern detection.Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub testland/qa --plugin qa-flake-triage