From pipeline
Score design variants against a 9-dimension rubric as a fresh reviewer. Trigger in Phase 2 after the planner finishes producing variants, on explicit /design-critique <work-package-id>, or to score a rendered implementation against its approved mockup. Skip if no design system is configured.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pipeline:design-critiqueThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Why:** The agent that created variants cannot objectively score them. Evaluation must be a different agent — the reviewer — reading the planner's output cold. True producer/evaluator separation: different personas, not just different cognitive modes. Run this on a fresh high-capability agent ({{models.review}}) that did not produce the variants.
Why: The agent that created variants cannot objectively score them. Evaluation must be a different agent — the reviewer — reading the planner's output cold. True producer/evaluator separation: different personas, not just different cognitive modes. Run this on a fresh high-capability agent ({{models.review}}) that did not produce the variants.
This is the evaluation counterpart to /design (production). The reviewer persona scores what the planner produced. /design generates variants; /design-critique scores them. The reviewer's context from this critique carries forward to the Phase 4 code review — it arrives already warm on the design decisions.
/design. Reviewer session./design-critique <work-package-id>: full audit of variants under the work package's design directory, e.g. .pipeline/work-packages/<id>/design/<variant-slug>/.If pipeline.config designSystem is null, stop — there is no design system to critique against.
Read the project's design-system conventions before critiquing. These live at {{designSystem.path}}:
{{designSystem.tokens}}.Deep reference, only when the dimension is in scope: references/hierarchy.md, references/spacing.md, references/typography.md, references/color.md, references/finishing.md. These carry general design wisdom; always cross-check the concrete numbers against the project's own {{designSystem.tokens}}, which override any example value below.
Read the variants produced by /design under .pipeline/work-packages/<id>/design/<variant-slug>/.
1. Screenshot each variant's mockup (or read its spec if no mockup exists)
2. Score it (0-10) against the 9 dimensions below
3. List specific findings (CRITICAL / WARNING / SUGGESTION)
4. Fix the highest-priority issue
5. Re-screenshot and re-score
6. Repeat until score >= 7 or 4 iterations reached
Variants below 5 fail and get regenerated; variants 5-7 get noted findings; variants ≥7 advance. If all variants score below 5, the brief was wrong — go back to /design Phase 0 and re-interrogate. Do not push weak variants into the comparison.
Read screenshots, not code, when scoring. Reading code to score visual hierarchy is theatre.
#hex values. Raw padding utilities instead of density tokens. Re-implementing what an existing primitive already does. Borders used differently than sibling components. Mixed radii in the same view.outline: none without replacement. Missing aria-labels. Non-semantic elements used as buttons.Tied to the design system's "premium" checklist and recommendations tables. Calibrate the exact numbers against {{designSystem.tokens}}; treat the values below as defaults.
font-variant-numeric: tabular-nums).+0.02em) on uppercase status/tag labels.prefers-reduced-motion is respected.#000 background in dark mode; the high-emphasis text token is off-white, not #fff.| Score | Meaning | Action |
|---|---|---|
| 9-10 | Exceptional | Ship it. This would impress a design-conscious user. |
| 7-8 | Good | Minor polish items. Acceptable for production. |
| 5-6 | Mediocre | Needs work on 2-3 dimensions before shipping. |
| 3-4 | Poor | Structural issues. Rethink the approach. |
| 0-2 | AI Slop | Start over. This looks like generic AI output. |
## Design Critique: [ComponentName / work-package-id]
DESIGN-CRITIQUE: variant=<slug> score=<int>/10 rounds=<n> axes: H:_ S:_ T:_ C:_ Co:_ St:_ A:_ AS:_ PP:_
### CRITICAL
- [file:line] Description of issue → specific fix
### WARNING
- [file:line] Description of issue → specific fix
### SUGGESTION
- [file:line] Description of issue → specific fix
### What's Working Well
- List 1-2 things the component does right (reinforces good patterns)
The retro reads rounds=<n> to track design-quality drift over time. Record the result to the work package's progress file: .pipeline/progress/<id>.json.
Render the variant and capture it, then read the image. Use whatever headless-browser screenshot tool the project provides; a typical invocation:
# Use the project's configured screenshot tool (see pipeline.config.yml or project docs).
# Example — Screenshot a running component view:
# <screenshot-tool> "<component-preview-url>" /tmp/critique-ComponentName.png
# Example — Screenshot a static HTML mockup:
# <screenshot-tool> "file://<absolute-path>/mockup.html" /tmp/critique-mockup.png
Read the screenshot. Evaluate it visually. Do not evaluate design from code alone.
These checks go beyond what AST linters catch. Token/scale violations (hardcoded colors, off-grid spacing, off-scale radius/font/motion, AI-aesthetic tells) are caught by the project's verify command, {{verify}} (e.g. via a design-lint rule set) — don't duplicate that work here. This section is for the judgment calls the linter can't make.
{{designSystem.path}} to know what exists. Reimplementing any of these inline is CRITICAL.{{designSystem.tokens}}.If the design system defines density tiers, an AST rule typically gates literal padding; this critique covers the contextual cases the rule can't see:
DESIGN-CRITIQUE: line with per-axis scores and a rounds count./design..pipeline/progress/<id>.json.$ARGUMENTS
npx claudepluginhub ambrovia/agent-skills-pipeline --plugin pipelineProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.