Skill

design-iteration-wisdom

Renders UI changes and judges layout from actual pixels, not CSS tokens. Diagnoses spacing, grouping, visual hierarchy, and alignment issues. Supports reference-match mode where a user-provided screenshot is the spec.

CSS

HTML

design

frontend

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/claude-resources:design-iteration-wisdom

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You cannot reliably judge a rendered layout from class names and design tokens.

Supporting Files

references/design-knowledge.md

SKILL.md

294 lines · ~4.3k tokens

Stats

LanguageShell

Stars10

Forks2

MaintenanceExcellent

Last CommitJun 16, 2026

Actions

View Source View Plugin View on GitHub View README

Design Iteration Wisdom

Why this skill exists

You cannot reliably judge a rendered layout from class names and design tokens. Whether spacing reads as "tight" or "loose" is a property of the rendered composition — the emergent result of token → utility → responsive variant → flex/grid resolution → content (image aspect, wrapped text) → browser layout. You only ever edit the top of that chain; the look only exists at the bottom, after rendering. Reasoning from tokens alone produces confident, wrong spacing — it can yield a too-tight AND a too-loose version from the same rule, because the rule was about magnitude and the problem is about relationships.

The fix is a loop: render it, look at the pixels, diagnose in real design terms by combining the project's design-system rules with general design knowledge, change code, look again. This skill is the judgment layer; the project's design-system skill is the source of allowed values. Use both.

But render-and-look only works if you look at the right thing. There are two different targets you can be judging against:

An aesthetic rule you infer — "this looks tight / unbalanced / heavy." The bitmap is evidence; you supply the rule (Gestalt grouping, hierarchy).
A concrete reference the user handed you — an annotated screenshot, a mockup, "make it match this." The bitmap is the rule; your job is to reproduce it, not to re-derive taste.

The loop (render → look) is identical for both, but the second has a failure mode this skill exists to stop and previously did not name: you render your change, look at your own output, and confirm it against your mental model of the request instead of against the reference's actual pixels — so a confident "looks right ✓" ships a result the user can see is wrong at a glance. You had the same bitmap they did. If they can spot the error instantly, the information was in the image; you compared it to the wrong thing. Reference-match mode has its own discipline — see "Matching a reference" below — and it is not optional whenever a reference exists.

When to use

A user calls a layout "tight" / "cramped" / "loose" / "too much space" / "off" / "doesn't breathe" / "unbalanced" / "messy".
You just implemented a UI/CSS/layout change and need it to look right, not only pass build + unit tests. (Passing tests proves it works, never that it looks good.)
You are refining spacing, grouping, visual hierarchy, alignment, or balance.
You catch yourself about to decide a spacing/layout question by reading class strings or a token table instead of seeing the rendered result. That instinct is the trigger.
The user handed you a visual reference — an annotated screenshot, a mockup, a marked-up capture (a drawn line, "green = must have, red = must not"), or "make it match this." The task is no longer taste; it is reproducing a spec from a bitmap. This is reference-match mode — read "Matching a reference" before you write any code, and again before you claim it is done.

The core principle (the crux — internalize this)

Spacing encodes grouping. Bind tight, separate loose. Things that belong together get less space between them; things that are distinct get more. The contrast (ratio) between hierarchy levels is the design — not the absolute size.

"Tight" usually means too little contrast — every gap is similar, so the eye finds no groups and reads a uniform wall. The cure is rarely "more space everywhere." It is more contrast: often keep (or tighten) the within-group gaps and enlarge the between-group / around-heading gaps.
"Loose" usually means oversized binders — the within-group gaps grew until the group dissolved. The cure is to shrink the binders, not the separators.
Never uniformly scale something hierarchical. Uniform small = tight. Uniform large = loose/broken. Both destroy hierarchy.
On larger viewports, escalate the separators, hold the binders. "Breathing room" on desktop comes from bigger gaps around groups, while the gaps within a tight list stay tight. Inflating within-group spacing on desktop is the classic way to make a page look broken.

For the full vocabulary (Gestalt grouping, type, alignment, visual weight, the symptom→cause→fix cheatsheet, and how to measure) read references/design-knowledge.md during the Diagnose step.

Matching a reference (annotated capture / mockup)

When the user gives a concrete visual reference, the reference is the spec. Success is not "does my result look good / look like a drawer / feel right" — it is "does my rendered result match the reference, element by element." The aesthetic-judgment half of this skill barely applies; this is reproduction, and it has its own discipline because it has its own failure mode (the one that created this section): rendering your change, looking at it, and confirming it against your own restatement of the request instead of against the reference.

Run these, in order — and treat skipping any one as the bug:

Read the reference as presence AND absence. A positional annotation is a complete statement, not a hint. A green line drawn along the bottom edge says "a border goes here" and, by every edge it does not cover, "no border anywhere else." Marking where something is also marks where it isn't. Before coding, write the spec out both ways: what the reference shows present and what it shows absent. The classic, exact miss that motivated this section: reading "border here" and silently adding borders (a full perimeter) the reference never drew.
Use the user's annotation legend verbatim. They often state it ("green = must have, red = must not have"; arrows = position; a box = the element in question). Apply that meaning literally. If a mark is genuinely ambiguous, ask one question — do not paper over it with a guess.
Verify by DIFF, never by self-judgment. Render your result at the same framing as the reference (same region, same zoom — crop to it, even with a sharp/Playwright crop), set the two images side by side, and walk every element: present-in-both, absent-in-both, present-in-one-only. Every "present in only one" is a defect — including elements present in yours but not the reference (the added border, the extra rule, the shifted corner). "Does this look right?" judged on your render alone is the trap; the only question is "does this match that?"
Do not author your own pass/fail checks from your interpretation. A checklist you wrote from your reading ("tab has no top border ✓") can only confirm your reading — it passes while the result is wrong, because the misread is upstream of the check. With a reference, the sole valid check is "matches the reference." If you delegate verification to a subagent, give it the reference image and tell it to diff against it; never hand it your paraphrased checklist (that just launders your misinterpretation through a second agent).
On "still wrong," re-derive from the reference — don't patch your model. When feedback says it's still off, the reflex is the smallest edit to your existing output (you remove the line across the tab but keep the perimeter you should never have added). Stop. Re-extract the spec from the reference from scratch, as if seeing it for the first time. Repeated feedback on the same element means your underlying model is wrong, not one step short — a patch on a wrong model stays wrong.

The loop

Step 0 — Detect project context

Design-system rules / tokens: look for a project skill (e.g. /l-design-system, or any *design-system* / *design-tokens* skill or doc). Invoke/read it — it defines the allowed values. Never invent raw px/rem when a token system exists. If none exists, use the values already present in the codebase as the de-facto scale.
Dev-serve command: check package.json scripts and the project's CLAUDE.md / README for the dev or serve command and port (e.g. pnpm dev, npm run dev). Reuse an already-running server if one is up — don't spawn a duplicate.
Target URL(s): the specific route(s) you are iterating on.

Step 1 — Serve

Start the dev/preview server (or confirm one is running) and get the URL. For gated preview deployments, note any auth cookie/header the project documents.

Step 2 — Screenshot at multiple breakpoints

Use /headless-browser to capture the target at the breakpoints that matter — at minimum mobile (~390px), mid (~760px), and desktop (~1300px). Responsive escalation rules can only be checked across widths; a single screenshot will mislead you.

Capture the worst case, not the page top. Layout breaks hide in the extreme content: scroll to the item with the longest title / most wrapping text, the longest list, AND the shortest / near-empty group — at the narrowest viewport. A clean top-of-page screenshot routinely hides a row whose title collapses to one character per line three screens down, or a short section drowning in separator space. Screenshot the region in question with its worst-case data, not whatever renders first.

Step 3 — Look, then measure

Look at each screenshot with your own vision. Describe honestly what you see at each width: where do groups blur together? where does something float apart? what is misaligned or visually heavy? If a reference exists, looking is comparative, not solitary — crop your render to the reference's framing and diff the two element-by-element (see "Matching a reference"). Never judge your render in isolation when you have a spec to match it against.
Measure when a judgment is close or you want an objective check. Get the real rendered pixel gaps (not nominal token values) and compute the contrast ratios — via /verify-ui (computed styles) or a Playwright bounding-box snippet (see references/design-knowledge.md → "Measuring rhythm"). Numbers turn "does it look grouped" into arithmetic you can verify.
Some claims are unmeasurable by eye — color is the clearest case, and the eye will lie to you confidently. A muted off-white line (rgb(214,211,209)) and pure white (rgb(255,255,255)) look identical on a dark background: the eye normalizes "light line on dark" to "white." So a screenshot does not settle a color question — you will look at the muted line and call it white. The same goes for any near-threshold value: a 1px vs 2px border, exact alignment, an opacity, a token that resolves to almost-but-not-the-target. Never report a color or exact value from looking. Sample it — and prefer the computed style over pixels. For anything that is a real DOM element with the color set in CSS (a border, a background, text), getComputedStyle(el).borderColor / .backgroundColor is authoritative: it's the actual resolved value, and it skips the entire fragile pixel harness below. Query the element first. Fall back to rendered pixel RGB from a screenshot only when there is no single element to ask — the "rule" is really a pseudo-element / box-shadow, or you're checking an image, a gradient, or an antialiased edge. Either way, compare to the target number. "It looks white ✓" is not a verification; "the computed border-color is rgb(255,255,255) ✓" is. If the user insists a color/value is wrong and you "can't see it," that is the tell that you are eyeballing something only measurement resolves — stop looking and sample.
- When you must sample pixels, the harness itself can lie — make it robust. A naive pixel probe produces false "muted/absent" readings that send you chasing ghosts: (a) Playwright's clip is CSS pixels, but the saved image is device pixels — don't multiply clip coords by devicePixelRatio (you'll sample the wrong place); map back by dividing the read coords. (b) A 1px line sits at a sub-pixel y, so a 1px-tall strip routinely misses it — sample a band (e.g. ±5px around the border) and take, per column, the pixel furthest from the background color (on a dark bg that's the brightest pixel; on a light bg the darkest — never hard-code "brightest" or a dark rule on a light bg samples the background and reads as absent). (c) Sample a text-free region — text glyphs in the foreground color contaminate the furthest-from-bg pixel and read as a false line. When two probes disagree, the naive one is wrong; trust the robust band-scan — but remember (a)–(c) only arise because you fell back to pixels; a queryable element never needs them.

Step 4 — Diagnose (synthesize, don't guess)

Load references/design-knowledge.md and name the problem in design terms — grouping/contrast, hierarchy, alignment, weight, type measure — mapped onto the project's token scale. Bad: "the rows need more spacing." Good: "within-series row gaps ≈ between-series gaps → no grouping contrast; tighten rows and raise the between-series gap a couple of steps on the scale." The diagnosis is the synthesis of general principle + project value.

Step 5 — Fix in code

Apply the change using project tokens/components. Keep changes minimal and targeted to what the diagnosis identified.

Step 6 — Re-screenshot and compare

Repeat Steps 2–5. Confirm the fix did what the diagnosis predicted and didn't break grouping at another breakpoint. Usually 2–4 rounds converge. In reference-match mode there is no "diagnosis" to confirm — re-crop and diff against the reference, and if the same element is still wrong, re-derive the spec from the reference rather than patching your model (see "Matching a reference" → the repeated-feedback rule).

Step 7 — Stop and hand to the human

Design has a taste component an AI approximates but doesn't own. When the computable invariants pass and the screenshots read cleanly at every width, stop and show the human the before/after screenshots for the final call. Don't loop indefinitely chasing a verdict only a person can give.

In reference-match mode the bar is different — not taste but fidelity. Stop when your render matches the reference element-for-element (presence and absence), and show the human your result next to their reference, not just a before/after of your own work, so the final check is the same direct comparison you just made.

Computable invariants (your eyeless self-check)

Even without perfect perception you can check these against measured px:

Contrast: between-group gap ≳ 2× within-group (sibling) gap.
Cohesion: within-group sibling gap ≤ the item's own line-height / height (so siblings still read as one group).
Monotonic hierarchy: spacing strictly increases as you move up the grouping tree (atom < item < sibling-gap < group < section).

If any fails, the layout will read as tight (low contrast), loose (binder too big), or chaotic (non-monotonic) — regardless of how "generous" the absolute values are.

Reference fidelity (when a reference exists — the strongest invariant): every element the reference marks present is present; every region it leaves unmarked is unchanged/absent; and nothing exists in your render that the reference does not show. An added border, rule, box, or shifted edge that the reference never contained is a defect even when it "looks fine" on its own — extra is wrong, not just missing. This check is interpretation-free, so it catches the misreads your own paraphrased checklists never will.

design-iteration-wisdom

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

design-iteration-wisdom

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Design Iteration Wisdom

Why this skill exists

When to use

The core principle (the crux — internalize this)

Matching a reference (annotated capture / mockup)

The loop

Step 0 — Detect project context

Step 1 — Serve

Step 2 — Screenshot at multiple breakpoints

Step 3 — Look, then measure

Step 4 — Diagnose (synthesize, don't guess)

Step 5 — Fix in code

Step 6 — Re-screenshot and compare

Step 7 — Stop and hand to the human

Computable invariants (your eyeless self-check)

Similar Skills

Design Iteration Wisdom

Why this skill exists

When to use

The core principle (the crux — internalize this)

Matching a reference (annotated capture / mockup)

The loop

Step 0 — Detect project context

Step 1 — Serve

Step 2 — Screenshot at multiple breakpoints

Step 3 — Look, then measure

Step 4 — Diagnose (synthesize, don't guess)

Step 5 — Fix in code

Step 6 — Re-screenshot and compare

Step 7 — Stop and hand to the human

Computable invariants (your eyeless self-check)

Similar Skills