From fable-discipline
Use when building or running an automated measurement/audit/reporting pipeline that scores many subjects on a recurring basis and reports to stakeholders — a compliance monitor, a benchmark matrix, a metrics dashboard, a scheduled scan-and-report. Establishes complete gap-accounting (every row resolves to a measurement OR a typed gap), evidence-as-proof, infrastructure-vs-real-failure classification, a one-command recurring run, and stakeholder-facing decision reporting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/fable-discipline:measurement-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
For systems whose job is to **measure a defined population on a cadence and report
For systems whose job is to measure a defined population on a cadence and report it honestly — a domains×themes compliance matrix, a benchmark suite, a metrics dashboard. The integrity bar is honest accounting: a result is trustworthy only if you can account for every subject, distinguish "measured" from "couldn't measure," and prove each measurement.
Read the repo's CLAUDE.md first for the verify commands and conventions.
Define the full population up front (the matrix: N rows = subjects × dimensions). Then: every row resolves to either a measurement or an explicit, typed gap doc — never silently drops. A "missing" row must only ever mean a pipeline bug, not a quiet omission.
not_measurable (a confirmed
structural reason), pending_verification (parked/untracked), blocked.Every verdict carries reproducible evidence — the probe output (JSON), a screenshot, the raw response — committed as an audit trail (even when other run output is gitignored). A claim a stakeholder might challenge must be backed by an artifact they can open. Screenshots are evidence, not a pass — never let an evidence doc get rolled up as a successful measurement.
A measurement that never executed (browser-launch failure, timeout, network
error, missing executable) is an infrastructure error, not a subject failure.
Classify and exclude it — counting it as a real violation invents findings. (The
canonical mistake: a missing browser binary producing N fake violations on
subject: unknown.) Tag infra errors distinctly and keep them out of the rate.
The cadenced run is one command (run.sh --period YYYYMM): env-guard → measure
all subjects with a pinned run id → emit gaps → reconcile + regression gate →
pull snapshot + evidence → render report. Document it in a runbook (TL;DR, step
table, prerequisites, troubleshooting, cadence). Fail-closed CI env guards with
explanatory messages (e.g. wrong index → exit before doing damage).
The report is a decision dashboard, not an audit appendix: lead with the site/portfolio-level conclusion and a prioritized action queue, then drill-downs, then evidence on demand, then the appendix. Preserve data-honesty invariants in the rendering (no unverified zeros shown as real, partial verdicts labeled, coverage warnings visible). A fresh external/design review (e.g. Codex) on the report is a worthwhile gate — stakeholders read structure before numbers.
Atomic commits; adversarial review (independent finders + per-finding refutation) before shipping; revert-and-document when a wrong approach is caught; verify after each step; honesty in every count and verdict.
See PROVENANCE.md for sources.
npx claudepluginhub petrkindlmann/fable-discipline --plugin fable-disciplineGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.