From causal-powers
Use whenever an analysis makes or implies a CAUSAL claim — "the effect of", "X caused Y", "the policy raised", "the treatment increased", "because we did X, Y changed" — or whenever you're running difference-in-differences, event studies, instrumental variables, regression discontinuity, matching, synthetic control, or panel fixed-effects models. Forces the identification strategy and its assumptions to be stated and tested BEFORE estimating, and treats the design-specific robustness suite (parallel trends, first-stage strength, manipulation tests, balance, placebo, sensitivity) as mandatory, not optional. Use in R, Julia, or Python even when the user just says "regress Y on X", "did it work", or "estimate the impact" — a regression coefficient is not a causal effect until the design earns it.
How this skill is triggered — by the user, by Claude, or both
Slash command
/causal-powers:causal-identificationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A regression coefficient is a correlation with good posture. It becomes a causal effect only when a *design* rules out the other explanations — and that design rests on assumptions that no amount of clean data or tight standard errors can supply. The fatal causal error is silent: the code runs, the coefficient is significant, the sign is plausible, and it's still just confounding wearing the co...
A regression coefficient is a correlation with good posture. It becomes a causal effect only when a design rules out the other explanations — and that design rests on assumptions that no amount of clean data or tight standard errors can supply. The fatal causal error is silent: the code runs, the coefficient is significant, the sign is plausible, and it's still just confounding wearing the costume of an effect.
Core principle: State the identification assumptions before you estimate, and test the ones that are testable. The estimate is only as credible as the assumption you can't test — so make that assumption explicit and argue for it.
Before any model, answer the Angrist–Pischke question: if you could have run the ideal randomized experiment to answer this, what would it be — and what real-world variation are you using as a stand-in for that randomization? Name the source of variation in one sentence and say why it's as good as random. If you can't, you don't have an identification strategy; you have a regression hoping to be one. Everything below — the design, the assumptions, the diagnostics — is just making that "as good as random" claim precise and testable.
NAME THE DESIGN → STATE THE ASSUMPTIONS → TEST THE TESTABLE ONES → ESTIMATE → ATTACK (robustness/placebo/sensitivity) → RECONCILE WITH DESCRIPTIVES
Picking the identification strategy, and changing it once the analysis is underway, are among the most consequential calls in the whole study — they decide what is even being estimated. They are not yours to make silently. When a diagnostic fails (pre-trends violated, weak first stage, manipulation at the cutoff, imbalance that won't resolve) or you discover a threat that calls for a different design, present the threat, the candidate remedies, and your recommendation as a checkpoint and let the user decide — see analysis-checkpoints. Surfacing "the parallel-trends assumption is violated; we could switch to a triple-difference, restrict the sample, or report with a caveat" is the job. Quietly upgrading the design to make the estimate behave is not — especially when it deviates from the pre-analysis plan.
did2s — not vanilla TWFE.rdrobust); covariate smoothness (no jumps in predetermined covariates at the cutoff); a donut specification excluding points right at the threshold; placebo cutoffs away from the real one.Adding a control can create bias as easily as remove it. The rule: only condition on variables determined before treatment. A control that is itself an outcome of the treatment reopens the very confounding you're trying to close.
"I added more controls and it got more robust" is not reassurance — more controls can mean more bias. Each control needs a reason it's pre-determined, not just a wish to be thorough.
These are part of the estimate, not a courtesy — but robustness is an argument, not an inventory. "Mandatory" means the threat-relevant checks are not optional — not that you run the whole per-design catalogue. Run the few that would break the result if your identifying assumption fails, not every permutation you can think of: three checks that each probe the real threat beat thirty that probe nothing, and a senior reader treats a sprawling robustness table as a tell of weak identification. Propose the shortlist (the ~3 threat-relevant checks, with rationales) to the user and get approval before running it — this is a checkpoint, not an autonomous fan-out (executing-analysis-plans, analysis-checkpoints).
pre-analysis-plan).| Design | R | Python | Julia |
|---|---|---|---|
| FE / DiD (TWFE) | fixest::feols | linearmodels.PanelOLS, pyfixest | FixedEffectModels.jl |
| Staggered DiD | did (Callaway–Sant'Anna), did2s, fixest::sunab | differences, pyfixest | — (call R, or hand-roll CS) |
| IV | `fixest::feols(y ~ x | f | d ~ z), ivreg` |
| RDD | rdrobust, rddensity (McCrary) | rdrobust (py) | — (call R) |
| Matching / PS | MatchIt, WeightIt, cobalt (balance) | causalinference, dowhy, econml | — |
| Sensitivity | sensemakr (Oster/Cinelli), rbounds | sensemakr (py) | — |
When a stack lacks a mature implementation (much of staggered-DiD and RDD outside R), say so and either call out to R or implement the estimator explicitly rather than silently falling back to a biased TWFE.
analysis-checkpoints).| Excuse | Reality |
|---|---|
| "The coefficient is significant, so X causes Y." | Significance measures noise, not confounding. A precisely-estimated correlation is still a correlation. |
| "I added a bunch of controls, so it's causal now." | Controls handle the confounders you observed and named. The dangerous one is the one you didn't. |
| "Parallel trends obviously holds." | Then plotting the pre-trends costs you nothing and earns the reader's trust. If you won't plot it, you're not sure. |
| "TWFE is the standard DiD." | It was. With staggered timing it's biased toward the wrong comparisons. Use a modern estimator. |
| "The instrument is clearly exogenous." | Exclusion is untestable, which is exactly why it needs a real argument, not an assertion. |
| "Robustness checks are for the appendix." | They're for deciding whether you believe your own result. Run them before you commit to it. |
Identification is not a terminal step. Once the design earns the estimate, it propels into exactly one next skill — route imperatively, don't just note the relationship:
digraph causal_identification_next {
"Diagnostic failed? (pre-trends / weak first stage / manipulation / imbalance) or design change needed?" [shape=diamond];
"invoke analysis-checkpoints — surface threat + remedies, user decides" [shape=box style=filled fillcolor=lightgreen];
"Estimate wrong sign / magnitude?" [shape=diamond];
"invoke wrong-number-debugging — rule out a data bug first" [shape=box style=filled fillcolor=lightgreen];
"invoke result-verification — verify before reporting" [shape=box style=filled fillcolor=lightgreen];
"Diagnostic failed? (pre-trends / weak first stage / manipulation / imbalance) or design change needed?" -> "invoke analysis-checkpoints — surface threat + remedies, user decides" [label="yes"];
"Diagnostic failed? (pre-trends / weak first stage / manipulation / imbalance) or design change needed?" -> "Estimate wrong sign / magnitude?" [label="no — design holds"];
"Estimate wrong sign / magnitude?" -> "invoke wrong-number-debugging — rule out a data bug first" [label="yes"];
"Estimate wrong sign / magnitude?" -> "invoke result-verification — verify before reporting" [label="no — design tested, robustness passed"];
}
analysis-checkpoints — present the threat, candidate remedies, and your recommendation; the design call is the user's, never a silent upgrade.wrong-number-debugging first — rule out a data bug before blaming identification.result-verification — run the placebo/sensitivity battery as part of verification before any number leaves the building. Do not end at "the coefficient is X".Causal claim → design named, assumptions stated, testable ones tested, modern estimator used, placebo + sensitivity survived, reconciled with raw data
Otherwise → a correlation with a confident voice
npx claudepluginhub lancegui/causal-powers --plugin causal-powersProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.