From crucible
Renders a weekly calibration ledger report from Crucible run data, showing caught bugs, verdict breakdowns, per-skill severity rates, and inflation alerts.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crucible:ledgerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!-- CANONICAL: shared/ledger-reduce.md -->
Renders docs/ledger/weekly-YYYY-Www.md from the central ledger
~/.claude/crucible/ledger/runs.jsonl (override CRUCIBLE_LEDGER_DIR; plus
falsification.jsonl beside it for the cross-link). The SKILL.md is a thin
prompt wrapper; the source of truth is the testable Python core at
scripts/render_ledger.py. Reports are written to docs/ledger/ relative to
your cwd — run from the crucible repo to update the committed reports.
Skill type: Utility — direct execution, no subagent dispatch.
Manual /ledger [--weeks N] (default N=1, the most recent ISO week).
Invoke the renderer directly. Resolve scripts/render_ledger.py by absolute
path from the plugin root (it self-locates its own modules via __file__, so
no PYTHONPATH is needed and it runs from any cwd):
python3 <plugin_root>/scripts/render_ledger.py --weeks N
That command:
~/.claude/crucible/ledger/runs.jsonl (the
--ledger default; CRUCIBLE_LEDGER_DIR overrides). First-time-ever: if
the file is MISSING, print "no ledger data yet" and exit 0 — do NOT write an
empty report.YYYY-Www).N weeks, writes docs/ledger/weekly-YYYY-Www.md.load_runs): skips blank / malformed / partial-trailing
lines, dedups defensively by (run_id, skill) latest-position-wins.caught_count): counts entries with
would_have_shipped_without_gate == true, EXCLUDING backfilled == true.
Backfilled entries carry severity_histogram: null ⇒ WHS null (the
mechanical L-3 rule) and are reported in a SEPARATE "Backfilled historical
context" section — never in the headline (L-5). This is what test T-5 asserts.significant_rate / fatal_rate (week_summary): computed
from forward-captured entries only (backfilled == false AND
severity_histogram != null). Raw rates are printed from week 1.inflation_alert): alerts when a skill's
significant_rate or fatal_rate exceeds 3× its 4-week rolling median.
Silent for a skill until 4 weeks of forward data exist (the v1 bootstrap
— with no forward history yet, the detector is silent, but raw rates still
print so a human can eyeball drift).falsified_count): the all-time count of falsified
verdicts via the L-9 latest-entry-wins reduction over
falsification.jsonl (scripts.ledger_reduce.reduce, the canonical helper
cited above). Graceful degradation: if falsification.jsonl is absent
(Phase 4 reconciler not built yet), the reduction returns {} ⇒ count 0;
the renderer never crashes./ledger run of a calendar month,
the report appends an advisory checklist prompting a spot-check of 5 random
would_have_shipped_without_gate: true entries against
skills/shared/severity-rubric.md.Design §4 calls for "findings with commit citations". The v1 schema has no
commit field (artifact_hash is null for backfill). So:
backfill-<PR>-quality-gate) cite PR #<PR>,
extracted from the deterministic run_id.run_id) have no commit in v1 ⇒ the citation is
omitted gracefully. No SHAs are invented. A future schema rev capturing the
gating commit can replace this with a real SHA.The headline is would_have_shipped_without_gate == true minus backfilled.
Backfilled entries seed the corpus but never inflate the caught-N number; they
appear only in the separate historical-context section. Inflation drift is
defended structurally (the 3× detector) and culturally (the monthly spot-check).
npx claudepluginhub raddue/crucibleWalks fix/hotfix branches to falsify gating-verdicts, computes per-skill Brier calibration scores, and appends a falsification log to the Crucible calibration ledger.
Generates regression analysis reports combining bisect, baseline, metrics data with code changes, CI/CD logs, issue trackers; analyzes impact, patterns for incident, sprint, release, quarterly scopes.
Orchestrates a Sunday meta-review combining pattern recognition, instinct engine, vault hygiene, and drift detection into a single synthesis note.