From causal-powers
Use when computing, transforming, cleaning, joining, merging, aggregating, reshaping, or modeling ANY result from data in R, Julia, or Python — before you trust a number, a table, a model metric, or a chart. Establishes data contracts and invariants up front, validates assumptions before building on them, asserts join cardinality before every merge, and freezes validated results as regression baselines. Use this whenever you load a dataset, write a transform or cleaning step, do a join or group-by, fit a model, or are about to report a figure — even if the user only says "analyze this", "what's the trend", "clean this up", "merge these two files", or "build this metric" without ever mentioning tests or validation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/causal-powers:data-contractsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A number you computed but never validated is a guess wearing a lab coat.
A number you computed but never validated is a guess wearing a lab coat.
Core principle: Lock in what must be true before you trust what you discovered.
This is the checker. It asserts invariants and reconciles totals; it does not plan or run the work. Its complement is the doer: during the cleaning phase, data-preparation calls this skill to validate each ingest/join/dedup/recode step, and executing-analysis-plans calls it on every later spine step and fanned-out spec. You reach for data-contracts directly the moment you're about to trust a number or do a join — and the doers reach for it on your behalf throughout.
This is the data-analytics counterpart of test-driven development. TDD's literal ritual — "write a failing test asserting the exact output, then implement" — does not transfer to analysis, because in analysis the output is the unknown. You cannot assert mean == 42.3 before you've computed it; computing it is the whole point. But the discipline underneath TDD transfers exactly, and matters more here.
In software the dangerous bug usually throws — a stack trace, a red test, something loud. In analysis the dangerous bug is silent: the code runs cleanly and hands you a confident, wrong answer.
NA / missing / NaN poisons a mean, or gets silently dropped and biases it.None of these raise an error. So we move the discipline from "assert the answer first" (impossible) to "assert everything around the answer that must hold regardless of the answer." Those are your data contracts and invariants. They are checkable before you know the result and again after — which is exactly the leverage test-first gives you in software.
1. Exploration (you're hunting for the answer: EDA, plotting, trying models). Do not force test-first here — it would be theater. The rule that does apply: validate the inputs before you trust any output, and check intermediate results at every step. Trust nothing you haven't looked at.
2. Reusable rules (a cleaning step, a metric definition, a transform you'll run again, a feature pipeline). Here you do know the rule, so real test-first applies cleanly. Hand-build a tiny fixture with a known answer, write the check, watch it fail, then implement. A metric definition without a test is a rumor.
Most analysis work flows from regime 1 into regime 2: you explore to find the right transform, then you lock it down as a tested, contracted rule. The mistake is staying in regime 1 forever and shipping exploratory code as if it were production.
CONTRACT → CHECK IT BITES → COMPUTE → RECONCILE → FREEZE
wrong-number-debugging to bisect to the bad step — don't patch and proceed. And if the "fix" would drop/winsorize/filter rows or move a number the user has already seen, that's a sample/spec change, not an autonomous fix → analysis-checkpoints.More silent analytics disasters come from joins than from anything else, because a join is the one operation that can change your row count in either direction without erroring. Before every merge, declare the relationship you expect, then assert it:
Say out loud what you expect, then let the tool enforce it:
| Python (pandas) | R (dplyr) | Julia | |
|---|---|---|---|
| Enforce cardinality | df.merge(o, on="id", validate="one_to_one") (or "one_to_many", "many_to_one") | left_join(o, by="id", relationship="one-to-one") (or "many-to-one") | assert key uniqueness before leftjoin: @assert allunique(o.id) |
| Catch dropped/added rows | assert len(out) == len(left) for a left join that must not fan out | stopifnot(nrow(out) == nrow(left)) | @assert nrow(out) == nrow(left) |
| Catch unmatched keys | check indicator=True value counts | anti_join() to see what failed to match | antijoin(left, right, on=:id) |
The row-count assertion around a join is the cheapest, highest-value check in all of data work. Write it every time.
Before a merge in an established project, consult the project's docs/LESSONS.md (and your memory) for prior join failures in this data — a fan-out that bit last month, a vintage mismatch in this crosswalk, a key that wasn't as unique as it looked. The capture half of that loop lives in result-verification; this is the recall half — a logged join bug only stops recurring if you read it back at the moment you're about to repeat it.
These are the things that hold regardless of the answer. Reach for the ones that fit your step:
[0,120], proportions in [0,1], no negative quantities, prices positive."N/A", "unknown", mojibake, trailing-space duplicates).eltype/class is what you think; dollars not cents; seconds not ms; the percent column really is a percent.NA/missing/NaN? Did an operation silently drop them? Is the missingness rate stable vs. last run?A contract is not just about values; it's about where the values came from. Before you build on a column, you should be able to answer: where did this come from, how was it derived, and what was dropped or recoded upstream to make it? Capture that as part of the contract — a short data dictionary alongside the code:
When a number later comes out wrong, this lineage is what lets wrong-number-debugging bisect fast instead of guessing. Document it while you still remember it.
The single most-skipped step, and the one that separates real validation from decoration. After writing a check, prove it can catch the thing it's meant to catch:
If your row-count assertion would pass on a broken join, it isn't protecting you — it's lying to you comfortably. Fix the check until it bites.
Once a number is right, make it stay right. Write the validated result (or its summary statistics) to a small, committed reference file. On the next run, diff against it. This converts the worst class of analytics bug — "the number changed three weeks ago and nobody noticed" — into an immediate, obvious failure. It's the regression test of data work.
Use the idioms native to each stack rather than bolting on a framework you don't need:
| Need | Python | R | Julia |
|---|---|---|---|
| Inline assertion | assert df.shape[0] == n, msg | stopifnot(nrow(df) == n) | @assert nrow(df) == n msg |
| Data contract / schema | pandera, great_expectations, pydantic | assertr, pointblank, validate | @assert on eltype, Test, custom schema check |
| Unit test (reusable rule) | pytest with tiny fixtures | testthat | Test stdlib (@test) |
| Missing handling to watch | NaN vs None, df.isna().sum() | NA, sum(is.na(x)) | missing, count(ismissing, x) |
| Reconcile totals | assert np.isclose(parts.sum(), total) | stopifnot(isTRUE(all.equal(sum(parts), total))) | @assert isapprox(sum(parts), total) |
Use floating-point-aware comparison (np.isclose / all.equal / isapprox) for any reconciliation — exact == on floats will betray you.
The table above covers one check at one line. The moment a script has two or more joins, a reconciliation, or a baseline to freeze, don't re-derive helper functions from scratch — copy the canonical prelude for your language from references/contract-helpers.md: assert_join (declared cardinality + row-count bracket + unmatched-key report), reconcile, na_audit, and freeze_baseline/check_baseline, in Python, R, Julia, and Stata (where isid, merge, assert(), and datasignature are built in).
| Excuse | Reality |
|---|---|
| "The data's clean, it came from the warehouse." | Warehouses fan out, change schemas, and re-key. Trust, then verify with a row count. |
| "I can see it's right." | You can see a number. You can't see the rows the join dropped. |
| "The totals are close enough." | "Close" on a reconciliation usually means rows are leaking. Find out why before you round it away. |
| "Adding checks slows me down." | A wrong number in front of a stakeholder costs far more than the 60 seconds the assertion took. |
Contracts are not a terminal step. The moment a check trips, they propel you out of "keep building" and into a named next skill — route imperatively, don't just note the relationship:
digraph data_contracts_next {
"Reconciliation / total / join-cardinality assertion FAILS?" [shape=diamond];
"invoke wrong-number-debugging — bisect to the bad step" [shape=box style=filled fillcolor=lightgreen];
"The 'fix' drops/filters/winsorizes rows or redefines a metric?" [shape=diamond];
"invoke analysis-checkpoints — STOP, it's a design decision" [shape=box style=filled fillcolor=lightgreen];
"Contract holds — keep computing the next step" [shape=box];
"Reconciliation / total / join-cardinality assertion FAILS?" -> "invoke wrong-number-debugging — bisect to the bad step" [label="yes"];
"Reconciliation / total / join-cardinality assertion FAILS?" -> "The 'fix' drops/filters/winsorizes rows or redefines a metric?" [label="no, but a 'fix' is tempting"];
"The 'fix' drops/filters/winsorizes rows or redefines a metric?" -> "invoke analysis-checkpoints — STOP, it's a design decision" [label="yes"];
"The 'fix' drops/filters/winsorizes rows or redefines a metric?" -> "Contract holds — keep computing the next step" [label="no"];
}
wrong-number-debugging — bisect the pipeline to the exact bad step; do not patch and proceed.analysis-checkpoints — that's a sample/spec redesign, not an autonomous bug fix; don't smuggle it in.Reported result → contract written, check seen to bite, totals reconciled, baseline frozen
Otherwise → not validated, just hopeful
You are not slowing down. You are refusing to be confidently wrong.
npx claudepluginhub lancegui/causal-powers --plugin causal-powersProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.