From idea-to-ship
Drives a working build toward near-finish-line quality by looping build, see, exercise, check, critique, and rebuild using Playwright, axe/Lighthouse, and headless screenshots. For tightening, self-verifying UI, or forcing iteration on acceptance criteria.
How this skill is triggered — by the user, by Claude, or both
Slash command
/idea-to-ship:build-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
`build-loop` takes a build that compiles-but-isn't-done and drives it toward **near-finish-line craft** by repeating one disciplined cycle — **build → see → exercise → check → critique → rebuild** — until its acceptance criteria pass or a stop-condition fires. It doesn't just re-read its own plan; it *runs, sees, and exercises what it built*, then feeds those observations back into the next reb...
build-loop takes a build that compiles-but-isn't-done and drives it toward near-finish-line craft by repeating one disciplined cycle — build → see → exercise → check → critique → rebuild — until its acceptance criteria pass or a stop-condition fires. It doesn't just re-read its own plan; it runs, sees, and exercises what it built, then feeds those observations back into the next rebuild.
It is composition-first: it orchestrates tools the agent already has — bash (build/test), a headless screenshot + vision to see, Playwright/headless to exercise the flows, axe/Lighthouse to check — via prose. It is not a new program, and not a reinvention of the see-and-exercise agents vendors already ship; it's the discipline that wraps them: a target, a loop, a stop-condition, and an honest ledger of what was actually checked.
This skill is the downstream partner to prompt-pack (which sequences what to build) and ideate (which decides whether to build it). It carries prompt-pack's execute-discipline (never fake a signal you didn't run) and ideate's conditional-design (a design bar only when feel is load-bearing).
Strong triggers — invoke without asking:
Softer triggers — invoke proactively:
Do NOT use this for:
ideate.deep-dive.Routing tie-breaker: build-loop answers "does this build actually work and meet its craft bar?" — not "is it the right thing to build?" (ideate) or "is it sound?" (deep-dive).
The honest bounds are load-bearing — they're stated here, up front, so no one reads a green loop as more than it is.
axe-checkable a11y violations, blown performance budgets. These are machine-checkable — this is the part that moves a rudimentary scaffold toward near-finish-line craft.If a defect lands in "cannot see," the loop's job is to name it and hand it off — never to paper over it with a passing-looking screenshot.
You cannot loop without a target; a loop with no criteria either thrashes forever or stops arbitrarily. Before iterating, lock:
references/acceptance-criteria.md.ideate's conditional-design forcing-function), carry a substantive design direction from the brief — target aesthetic/vibe, 1–2 reference points, key design principles, the intended "feel" — plus concrete design acceptance criteria, and compose frontend-design for the direction + craft. When it's load-bearing the visual design loop is mandatory and runs every iteration (step 2) — a passing unit test never substitutes for it. If it's a plain utility, the bar is plain-but-clear — say so and don't gold-plate it.If these don't exist yet, write them first (or pull them from the CONCEPT_BRIEF.md / pack). No criteria → no loop.
Run this cycle each iteration. The detailed per-step playbook — exact invocations, what to look for, how to score — is in references/loop-procedure.md.
bash. Capture exit codes and pass counts as ground truth. A red build is iteration 0's only job — get it green before anything else; you can't see or exercise what won't run.preview_start/preview_screenshot/preview_click/preview_fill/preview_console_logs; or Claude-in-Chrome; or computer-use); the always-available fallback is a Playwright script that launches the app and screenshots it. Capture the key states (empty/loading/error/hover/focus/filled) across ≥2 viewports (mobile + desktop); critique against (a) the brief's design direction and (b) general craft — hierarchy, spacing/rhythm, typography, color + contrast, motion/interaction feel, state coverage, responsiveness, polish — and emit design defects with severity, like machine defects. Compose frontend-design for how to make it good, not just to flag bad. Capture console errors too. No renderer? Report the visual loop NOT RUN (the design bar is unmet for a load-bearing build) — never fake it. Full checklist + exact tool invocations: references/design-critique.md.axe for a11y (contrast/focus/roles), Lighthouse for performance budgets, and a scan for broken assets/links.Every iteration advances two co-equal tracks — (i) the objective machine facts (build/test/flows/console/a11y) and (ii), when design is load-bearing, the design bar. Neither substitutes for the other: green machine facts never excuse a skipped design loop, and a pretty screenshot never excuses a red test. Keep a per-iteration ledger carrying both — the objective signals (build exit, test pass count, flows passing, console-error count, a11y-violation count) and a design column (design-bar criteria met / design defects open). The objective signals are ground truth; the design column mixes checkable items (states present, contrast, responsiveness) with self-graded taste — keep the soft part labeled. That ledger is the craft-delta evidence you report — not "it looks better."
Stop and report the moment any of these fires:
prompt-pack's execute-discipline: never fake it, render a passed-looking result, or build past it to look complete.Why this can't infinite-loop: BUDGET is an unconditional counter — even if PASS, PLATEAU, and BLOCKED never trigger, the loop halts at N iterations. The other three only ever stop it sooner. Every run terminates.
Always report the iteration count (how many full passes ran), the per-iteration ledger (both tracks), which condition fired, and the open-defect queue at stop — so a one-pass run on a non-trivial build is visibly incomplete and a premature stop is obvious. The craft-delta is the ledger, not a vibe. (More passes close checkable defects — they don't reach the last-mile tail the loop can't see; don't read "more iterations" as "perfect.")
The builder grading its own taste is a monoculture at the visual layer. A critic running a different model, handed only the screenshots + acceptance criteria + design bar (blind to the builder's own rationale), surfaces taste/quality judgments the builder shares with itself and can't see. Fold its defects into step 5's critique.
Modularity is locked: for a plain utility the critic is optional and the loop runs fully without it — the objective signals (build/test/flows/console/a11y) never need a second model. When design is load-bearing, run the critic by default as the taste check — but it stays an enhancement on the design track, never a gate on the objective signals, and never a substitute for actually rendering the UI. Honest limit: even with it you have two correlated models, not a user — the human spot-check is the final taste gate; still zero market signal.
The method is portable; only the tooling degrades. Substitute the fallback and say so in the report — a signal you couldn't run is reported as not run, never as passed (execute-discipline).
| Capability | If unavailable |
|---|---|
bash build/test | Core — if you can't build/test, you can't loop; stop and say so. |
| Render + screenshot (see) | Prefer an interactive renderer (Preview MCP / Claude-in-Chrome / computer-use); else a Playwright screenshot script. If none can run: report the visual loop NOT RUN — and when design is load-bearing the design bar is unmet, so you cannot PASS; never fake it. For a plain utility, proceed on the objective signals and note the gap. |
| Playwright/headless (exercise) | Fall back to whatever drives the surface — run the CLI, call the API, hit the endpoint — and assert outputs. Skip only if there's genuinely no exercisable surface. |
axe / Lighthouse (check) | Skip the a11y/perf checks; report them as not run. |
| Different-model critic | Run the loop without it (it's optional by design). |
Non-browser surfaces (a CLI, a library, a backend) have no see step — run build/test plus a targeted exercise (invoke the binary, call the function/endpoint, assert the result) and skip the visual steps. The loop still earns its keep on the objective signals.
| Situation | What to run |
|---|---|
| A tiny change that just needs to compile | One pass: build + test, eyeball. No full loop. |
| A fresh scaffold or feature (plain utility) | Full loop, N≈3–5, acceptance criteria; design bar = plain-but-clear. |
| A build where feel is the wedge | Full loop with the visual design loop every iteration (render → screenshot key states × viewports → critique) + the different-model critic by default + a flagged human spot-check as the final taste gate; substantive design bar (compose frontend-design). |
| A non-browser CLI/library/backend | Build/test + targeted exercise; skip see; objective signals only. |
The loop scales down to a single build/test pass and up to a multi-iteration, critic-augmented drive — but the discipline is constant: a checkable target, an honest ledger, a stop-condition, and a clean hand-off of the tail it cannot see.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub nelsonwerd/idea-to-ship-skills --plugin idea-to-ship