From qa-test-data
Reference catalog for snapshot / golden file management - naming conventions, directory layout, when to add / update / remove a baseline, sanitization (timestamps, IDs, PII), per-OS / per-runtime variant strategy, and review workflow for snapshot diffs in PRs. Use when designing a snapshot-testing convention or auditing an existing one for drift.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-test-data:golden-file-conventionsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> **Terminology note:** "golden file" / "golden master" are
Terminology note: "golden file" / "golden master" are practitioner-emergent terms popularized by the Working Effectively with Legacy Code tradition. ISTQB has no canonical entry - the closest formal term is "snapshot test." This catalog uses both interchangeably; assume "golden file" and "snapshot" mean the same thing in the rest of the body.
A reference catalog for how to manage snapshot / golden files.
Pairs with golden-file-manager
which is the active management agent that updates / prunes
golden files based on these conventions.
Most snapshot frameworks (Jest, Vitest, pytest-snapshot, RSpec Snapshot) use a path adjacent to the test file:
src/
components/
Button.tsx
Button.test.tsx
__snapshots__/
Button.test.tsx.snap
Convention: one snapshot file per test file, named
<test-file-name>.snap. Do not split snapshots across multiple
files per test.
Inside a .snap file, each snapshot is keyed by <describe> >
<it> chain:
exports[`Button renders with primary variant 1`] = `<button class="primary">...</button>`;
The trailing 1 is the snapshot index when one test takes multiple
snapshots - keep these to a minimum (≤3 per test); beyond that,
split the test.
For visual / screenshot-based snapshots, the name carries the
platform suffix (per
playwright-snapshots):
Button-primary-1-chromium-linux.png
Button-primary-1-firefox-linux.png
Button-primary-1-webkit-darwin.png
OS / browser suffixes are load-bearing - anti-aliasing and font metrics differ. Don't strip them.
| Layout | When to use |
|---|---|
Adjacent (__snapshots__/ next to test) | Default. Reviewer sees the diff in the same PR view as the test. |
Centralized (tests/__fixtures__/) | Cross-test fixtures (golden inputs reused by many tests). |
External (s3://snapshots-bucket/) | Visual snapshots that are large; CI uploads / downloads. Common with Percy, Chromatic, Playwright + S3. |
Default to adjacent. Centralized only when fixtures are reused. External only when artifact size makes adjacent impractical.
Add a snapshot when:
Don't add a snapshot for:
A snapshot that contains volatile values (timestamps, UUIDs, random IDs, current dates) breaks every run. Sanitize before snapshotting:
| Volatile field | Sanitization pattern |
|---|---|
| Timestamps | Replace with a fixed string [TIMESTAMP] or freeze the clock (vi.useFakeTimers()). |
| UUIDs | Replace with [UUID] or seed a deterministic generator. |
| Auto-increment IDs | Replace with [ID] or use a sequence-controlled fixture. |
File paths (/var/folders/...) | Replace with [PATH] or normalize via project root. |
| Memory addresses (object refs) | Avoid in serialized output; use a custom serializer. |
| User-data tokens | Strip before snapshotting; tokens shouldn't be in the test surface anyway. |
Most frameworks support custom serializers / matchers - use them.
Jest's expect.any(Date) matcher pattern is canonical:
expect(result).toMatchSnapshot({
createdAt: expect.any(Date),
uuid: expect.any(String),
});
The serializer normalizes volatile fields before comparison,
so the snapshot shows Any<Date> rather than a specific timestamp.
When a snapshot diff appears in a PR:
Is the diff explained by code changes in the same PR?
├── No → REGRESSION; fix the code, do not update the snapshot.
└── Yes → Did the diff align with the intent (described in the PR title)?
├── No → REGRESSION (cascade from an unrelated change); investigate before updating.
└── Yes → Is the diff isolated to the components the PR is supposed to change?
├── No → INVESTIGATE: a CSS / token / shared-component change affected unrelated snapshots.
└── Yes → UPDATE: run `--update-snapshots` and commit.
The most common review failure is rubber-stamping snapshot
updates - accepting a 47-component diff because the PR title says
"Refactor Button". The diff classifier in
golden-file-manager
implements this decision tree.
Every snapshot has an implicit severity:
| Tier | Behavior | Examples |
|---|---|---|
| Critical | Blocks merge on diff; requires explicit reviewer acceptance. | Production-shipped pages; payment flows; auth. |
| Standard | Blocks merge on diff; author can self-approve with a clear PR description. | Internal admin tooling; non-shipping experiments. |
| Advisory | Surfaces diff but doesn't block. | Unstable areas under active redesign; new baselines during ramp-up. |
Promote Advisory → Standard after ~2 weeks of stability. Promote Standard → Critical for security-sensitive surfaces.
Remove a snapshot when:
The golden-file-manager
agent automates the "test deleted but snapshot remained" cleanup.
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Updating snapshots in a separate "snapshot refresh" PR | Reviewer can't see the code change that justifies the diff. | Always update snapshots in the same PR as the source change. |
--update-snapshots in PR CI as the default | Snapshots become tautologies; never catch a regression. | Update snapshots only in interactive runs; PR CI fails on diff. |
| Snapshotting raw HTML for components | Brittle to attribute-order changes from tooling upgrades. | Snapshot the React / Vue / Svelte component tree (e.g. react-test-renderer), not raw HTML; OR use a normalizer. |
| One mega-snapshot per page | A 5kb diff is uninterpretable; reviewers approve to move on. | Per-component snapshots; smaller surface = faster review. |
| Storing snapshots externally without checksums | A drift in S3 vs. the test code makes "what changed?" hard. | Include checksums in the test code; verify on each run. |
| Snapshots of error messages with stack traces | Stack traces include line numbers that drift with every refactor. | Snapshot the error type + message only; strip the trace. |
| Cross-OS shared snapshots | Anti-aliasing / font / line-ending differences flake the test. | Per-OS snapshot suffixes (see naming above). |
golden-file-manager -
active-management agent that uses this catalog.playwright-snapshots - visual-snapshot-specific naming and per-OS suffix conventions.visual-baseline-conventions - broader visual-coverage conventions; this skill is the
text/object-snapshot equivalent.npx claudepluginhub testland/qa --plugin qa-test-dataProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.