Skill

measurement-pipeline

Use when building or running an automated measurement/audit/reporting pipeline that scores many subjects on a recurring basis and reports to stakeholders — a compliance monitor, a benchmark matrix, a metrics dashboard, a scheduled scan-and-report. Establishes complete gap-accounting (every row resolves to a measurement OR a typed gap), evidence-as-proof, infrastructure-vs-real-failure classification, a one-command recurring run, and stakeholder-facing decision reporting.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/fable-discipline:measurement-pipeline

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

For systems whose job is to **measure a defined population on a cadence and report

SKILL.md

70 lines · ~954 tokens

Stats

Stars0

MaintenanceExcellent

Last CommitJun 16, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Measurement Pipeline

For systems whose job is to measure a defined population on a cadence and report it honestly — a domains×themes compliance matrix, a benchmark suite, a metrics dashboard. The integrity bar is honest accounting: a result is trustworthy only if you can account for every subject, distinguish "measured" from "couldn't measure," and prove each measurement.

Read the repo's CLAUDE.md first for the verify commands and conventions.

1. Complete gap-accounting (the core invariant)

Define the full population up front (the matrix: N rows = subjects × dimensions). Then: every row resolves to either a measurement or an explicit, typed gap doc — never silently drops. A "missing" row must only ever mean a pipeline bug, not a quiet omission.

Emit one typed gap per unmeasured row: e.g. not_measurable (a confirmed structural reason), pending_verification (parked/untracked), blocked.
Gaps count toward coverage but are excluded from the success/compliance rate — so coverage and rate are different, honest numbers (measured + gaps = full population accounted; e.g. 90 measured + 10 gaps = 100/100).
Reconcile observed-vs-expected every run and fail loudly on an unexplained delta.

2. Evidence as proof

Every verdict carries reproducible evidence — the probe output (JSON), a screenshot, the raw response — committed as an audit trail (even when other run output is gitignored). A claim a stakeholder might challenge must be backed by an artifact they can open. Screenshots are evidence, not a pass — never let an evidence doc get rolled up as a successful measurement.

3. Infrastructure failure ≠ real failure

A measurement that never executed (browser-launch failure, timeout, network error, missing executable) is an infrastructure error, not a subject failure. Classify and exclude it — counting it as a real violation invents findings. (The canonical mistake: a missing browser binary producing N fake violations on subject: unknown.) Tag infra errors distinctly and keep them out of the rate.

4. One-command recurring run + operator runbook

The cadenced run is one command (run.sh --period YYYYMM): env-guard → measure all subjects with a pinned run id → emit gaps → reconcile + regression gate → pull snapshot + evidence → render report. Document it in a runbook (TL;DR, step table, prerequisites, troubleshooting, cadence). Fail-closed CI env guards with explanatory messages (e.g. wrong index → exit before doing damage).

5. Stakeholder decision reporting

The report is a decision dashboard, not an audit appendix: lead with the site/portfolio-level conclusion and a prioritized action queue, then drill-downs, then evidence on demand, then the appendix. Preserve data-honesty invariants in the rendering (no unverified zeros shown as real, partial verdicts labeled, coverage warnings visible). A fresh external/design review (e.g. Codex) on the report is a worthwhile gate — stakeholders read structure before numbers.

Cross-cutting

Atomic commits; adversarial review (independent finders + per-finding refutation) before shipping; revert-and-document when a wrong approach is caught; verify after each step; honesty in every count and verdict.

See PROVENANCE.md for sources.

measurement-pipeline

Invocation

Context Preview

SKILL.md

measurement-pipeline

Invocation

Context Preview

SKILL.md

Measurement Pipeline

1. Complete gap-accounting (the core invariant)

2. Evidence as proof

3. Infrastructure failure ≠ real failure

4. One-command recurring run + operator runbook

5. Stakeholder decision reporting

Cross-cutting

Similar Skills

Measurement Pipeline

1. Complete gap-accounting (the core invariant)

2. Evidence as proof

3. Infrastructure failure ≠ real failure

4. One-command recurring run + operator runbook

5. Stakeholder decision reporting

Cross-cutting

Similar Skills