From greenloop
Use this skill whenever the user asks to verify, test, check, validate, or confirm their repo before committing or pushing — including requests like "do a pre commit verification", "run local tests before push", "deep local verification", "fix everything and test it", "make sure this is ready", or "run the full verification pass". Covers stack-aware test discovery, unit/integration/e2e tests, smoke tests, lint checks, type checks, builds, and a runtime smoke/integration harness (binary-launch, migration idempotency, provider mocks, headless webview + browser verification for web UIs, env probe). Supports a tamper-proof clean-room mode (re-run in a fresh worktree of HEAD so uncommitted state can't rig the verdict), an Ed25519-signed witness manifest for tamper-evident green results, and per-check reliability tracking across runs (pass³) that exposes flaky checks instead of retrying them green. Do NOT use for a single narrow test unless explicitly requested.
How this skill is triggered — by the user, by Claude, or both
Slash command
/greenloop:pre-commit-verificationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run the deepest practical local verification surface for the current repo, fix failures, rerun the smallest useful checks first, then confirm the broad local state before commit.
references/harness/README.mdreferences/harness/dev_prod_matrix.shreferences/harness/env_probe.shreferences/harness/migration_idempotency.shreferences/harness/onboarding_clickthrough.spec.tsreferences/harness/provider_mock_test.rsreferences/harness/smoke_http_health.rsreferences/harness/webview_e2e.spec.tsRun the deepest practical local verification surface for the current repo, fix failures, rerun the smallest useful checks first, then confirm the broad local state before commit.
This skill runs two layers:
The harness is what closes the gap where "all tests pass but the app fails when actually launched." Unit tests run in a sanitized context (mocked DB, mocked providers, no real binary, no real webview) and pass while the integrated, launched system fails. The harness exercises the real launched system.
Use this skill when the user asks to:
Do not use this skill for a single narrow test unless the user clearly wants only that narrow check.
This skill is also invoked automatically by app-audit (Phase 0, as the clean-baseline gate) and audit-fix (Phase 0 preflight, and after every fix). When invoked by app-audit, the structured results — including per-category harness outcomes — are folded into AUDIT_LOG.md (see "Structured result reporting" below).
Do not assume one fixed gate. Inspect the repo and run the broadest practical local verification surface that matches its stack and conventions.
Inspect the repo and include the applicable categories:
.audit/acceptance-map.md when it exists (the spec→test mapping app-audit maintains) — run the mapped tests and report acceptance: N mapped, M run, K UNMAPPED; grading the gaps is app-audit's job, running the tests is this skill'sgit status --shortREADME.md, AGENTS.md, pyproject.toml, package.json, Makefile, justfile, CI config, and test config filestests/smoke/, smoke/, or whatever the project adopted). If the repo is a runnable app and the harness is missing, offer to scaffold it (see "Scaffolding the harness").PYTHONPATH=src ./.venv/bin/python -m pytest tests/ -q --no-header
./.venv/bin/python -m ruff check src tests
./.venv/bin/python -m mypy src
Adjust PYTHONPATH, package path, and test scope to the repo layout.
npm run build
npm run test
npm run lint
Use the commands the repo actually defines rather than forcing a generic pattern.
Combine backend and frontend verification, then run the smoke/integration harness — for a desktop app, that's the layer most likely to catch real failures.
The harness is a set of runnable tests that exercise the launched system. It lives in the project (scaffolded once, then owned and customized by the project), and pre-commit-verification runs it as part of the verification surface. The pattern mirrors cartographer's _build.py: the skill ships templates in references/harness/, they're copied into the project once, customized per project, then run on every verification pass.
Each template targets a class of runtime failure that static analysis and unit tests miss. The numbering matches the failure taxonomy these were designed against:
| # | Failure class | Template | What it does |
|---|---|---|---|
| 1+4 | runtime + build-graph | smoke_http_health.rs | Launch the binary, fetch /api/health with an Origin header, assert 200 + CORS header present + a non-empty startup log line |
| 2 | context-dependent | dev_prod_matrix.sh | Run the smoke suite under both dev and prod-bundled launch contexts |
| 3 | stateful DB | migration_idempotency.sh | Fresh DB → run migrations → run them again → assert idempotent (second run is a clean no-op) |
| 5 | browser quirks | webview_e2e.spec.ts | Headless automation that drives the actual .app webview (Playwright-style), not a mocked DOM |
| 6 | missing layers | (static — see note) | Required middleware for cross-origin webview. This one is not a runtime test — it's a static checklist item owned by app-audit (Category: security / external integrations). The harness verifies behavior; app-audit verifies the layer is present. |
| 7 | launch env | env_probe.sh | Launch under a stripped environment (env -i PATH=..., launchd-style) to catch "works in my shell, fails when launched by the OS" |
| 8 | external APIs | provider_mock_test.rs | Fixture-based provider tests with wiremock/httpmock — pin external-API contracts so provider drift is caught locally |
| 9 | UX | onboarding_clickthrough.spec.ts | Click-through automation of onboarding-style flows |
Category 6 is intentionally not a runtime test. It's a structural fact ("is the CORS middleware registered?") that app-audit checks statically. The harness's category-1 test catches the behavioral failure (wrong header at runtime); app-audit's checklist catches the structural absence. Both are needed — presence ≠ correct behavior.
Web apps, not just desktop. Categories 5 and 9 apply to any web frontend, not only desktop webviews: point Playwright (or the available browser/preview tooling) at the dev server and drive the actual rendered page — load the route, assert the critical element exists, exercise the key interaction, check the console for errors. The webview_e2e.spec.ts template works as the starting point (swap the .app target for the dev-server URL). When the change under verification touches UI code, this browser layer is required, not optional — a component can compile, pass unit tests, and still render blank; only a real browser catches that class.
When the repo is a runnable app and the harness isn't present:
"This repo has no runtime smoke harness. The unit tests pass but won't catch launch/integration failures (binary boot, CORS at runtime, migration idempotency, webview behavior, env-under-launchd, provider drift). I can scaffold a starter harness from templates — you'll customize the project-specific bits (binary name, health route, DB URL). Scaffold it?"
references/harness/ into the project's chosen location (default tests/smoke/ for a Rust backend, e2e/ for webview specs). Only scaffold categories that apply to the stack.# CUSTOMIZE: comments). Fill in what's known from the repo (binary path, health endpoint, migration command); leave a TODO for what isn't.make smoke target, an npm script, or a cargo test --test smoke_* pattern — so it's discoverable on future runs and the project owns it.Do not re-scaffold if the harness already exists. Treat the project's copies as authoritative (the templates are starting points, not a source of truth to overwrite).
When you launch a server in the background to probe it (HTTP health, webview, etc.),
never end with bare wait and never rely on %-job control. A non-interactive
shell has job control off, so kill %1 silently does nothing, and wait then blocks
forever on a server that never exits — freezing the whole verification (and, under
the DevLoop driver, the entire loop). This is the single most common way a smoke check
hangs. Always: capture the PID, bound the whole thing with timeout, and kill the PID
explicitly (never kill %1; wait).
Safe, stack-agnostic pattern:
# 1) launch, capture the real PID, redirect output to a file (not the tool's pipe)
<server-launch-cmd> > /tmp/smoke.log 2>&1 &
SRV=$!
# 2) wait for readiness with a BOUNDED poll (never an unbounded wait)
for i in $(seq 1 30); do
curl -fsS "$HEALTH_URL" >/dev/null 2>&1 && break
sleep 1
done
# 3) do the probe(s)
curl -sS -i "$HEALTH_URL" | head -20
# 4) ALWAYS tear down by PID, then reap only that PID (never bare `wait`)
kill "$SRV" 2>/dev/null || true
wait "$SRV" 2>/dev/null || true # reaps ONLY $SRV — safe; bare `wait` is not
Belt-and-suspenders: wrap the launch in timeout 60 bash -c '…' so even a botched
teardown self-terminates. If a server ignores SIGTERM, escalate (kill -9 "$SRV").
Prefer an in-process test client (e.g. FastAPI TestClient, axum test server) over a
bound network server when you only need to hit a route — it can't leak a process at all.
A check that passes this run and a check that passes reliably are different
claims, and flaky tests are how false confidence gets institutionalized. Track
per-check results across runs in .audit/check-history.json (append one entry
per run: check name, pass/fail, HEAD SHA, timestamp; keep the last ~20 runs):
reliability: 14/15 checks pass³-stable; harness[5] webview e2e flipped 2× in last 5 runs.A verification run in the working tree answers "is this tree green?" — which
includes every uncommitted file, edited test, and local-only config. That
verdict can be rigged, accidentally or otherwise: a helper file that was never
git add-ed, a test weakened during debugging, a .env that exists only on
this machine. Clean-room mode answers the question that actually matters before
a push or release: "is what's committed green?"
Run the same verification surface in a fresh worktree of HEAD:
CR=$(mktemp -d)/cleanroom
git worktree add --detach "$CR" HEAD
# install deps from committed manifests only, then run the same checks there
( cd "$CR" && <install + run the verification surface> )
RESULT=$?
git worktree remove --force "$CR"
tests pass in tree, fail at HEAD: src/config_local.py is untracked) — that diagnosis is the entire value.references/verification-depth.md §3.After a clean-room pass goes green, the result can be recorded as a
tamper-evident witness instead of just a log line: write
.audit/witness/<timestamp>.json containing the commit SHA, the per-check and
per-harness-category results, truth scores (when invoked by audit-fix), and the
skill versions — then sign it with a local Ed25519 key:
# one-time: generate the witness key (no passphrase; it signs, it's not a secret store)
ssh-keygen -t ed25519 -N "" -f ~/.selran/keys/greenloop-witness -C greenloop-witness
# sign
ssh-keygen -Y sign -f ~/.selran/keys/greenloop-witness -n greenloop-witness \
.audit/witness/<timestamp>.json
# anyone with the .pub can verify
ssh-keygen -Y verify -f allowed_signers -I greenloop-witness -n greenloop-witness \
-s .audit/witness/<timestamp>.json.sig < .audit/witness/<timestamp>.json
What it proves: this manifest existed in this exact form and was signed by this
key — so a later claim of "it was green at commit X" is checkable, and silent
edits to the record are detectable. What it does NOT prove: that the checks
themselves were comprehensive (that's the coverage declaration's job). Offer it,
don't force it — generate the key only with the user's awareness, and skip
silently when declined. Manifest format: audit-fix's
references/verification-depth.md §5.
Always report results in this structure, so app-audit can capture them into AUDIT_LOG.md and audit-fix can act on failures:
Pre-commit verification — <commit-or-HEAD>, <timestamp>
Final result: PASSED | PASSED-AFTER-FIXES | RED
Existing checks:
- unit/integration: <cmd> → <N passed / M failed>
- lint: <cmd> → clean | <issues>
- typecheck: <cmd> → clean | <issues>
- build: <cmd> → success | failure
- format: <cmd> → clean | <issues>
- acceptance (when .audit/acceptance-map.md exists): <N mapped, M run, K UNMAPPED> → <passed/failed>
Smoke/integration harness:
- [1+4] HTTP health smoke → PASS | FAIL: <detail> | SKIPPED: <why>
- [2] dev/prod matrix → PASS | FAIL (dev) | FAIL (prod-bundled): <detail>
- [3] migration idempotency → PASS | FAIL: <detail>
- [5] webview e2e → PASS | FAIL: <detail> | SKIPPED: <why>
- [7] launch-env probe → PASS | FAIL: <detail>
- [8] provider mocks → PASS | FAIL: <detail>
- [9] onboarding clickthrough → PASS | FAIL: <detail>
Auto-fixed this run: <list, or none>
Still red (could not auto-fix): <list, or none>
Repo state: clean | dirty (<paths>)
Clean-room re-run: PASS (worktree @ <sha>) | FAIL: <delta vs in-tree> | not run (<why>)
Witness: .audit/witness/<ts>.json (signed) | not written
Could not verify locally: <list, or none>
When a harness category FAILS and you cannot auto-fix it, report it with enough location detail (test name, the file/route/migration involved) that it can become a finding. app-audit converts these into severity-graded findings in AUDIT_LOG.md; audit-fix then re-runs that specific check to re-verify before and after fixing.
Known-failure baselines. This skill reports results; it does not decide what a failure means for the caller. When invoked by audit-fix mid-remediation, some checks are expected to fail (they correspond to findings not yet fixed) — audit-fix compares the per-check results against its Phase 0 baseline and only treats new failures as regressions. So: always report every check's result faithfully, including expected failures, and never summarize a run as simply "RED" without the per-check breakdown — the breakdown is what makes baseline comparison possible.
After a fix, rerun the smallest relevant test, check, build, or smoke step first. Then rerun the wider suite.
If local workers, dev servers, or background tasks are causing lock contention, generated churn, or CPU pressure, stop them before the deep verification pass when practical.
Do not treat generated files as meaningful product changes unless the repo intentionally tracks them.
references/adversarial-verification.md.)references/harness/README.md — manifest of the harness templates, scaffolding guidance, customization points, and how each category maps to the failure taxonomy. Read before scaffolding.references/harness/* — the per-category test templates. Copy into the project on scaffold; customize the marked points.references/verification-depth.md — clean-room worktree recipe with per-stack install notes (§3) and the witness manifest format + sign/verify recipes (§5).Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub apourmd941/selran-devloop --plugin greenloop