Guides instrumental variables analysis and 2SLS estimation in R or Python with first-stage diagnostics, weak instrument detection, and overidentification tests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/everyday-causal-skills:causal-ivThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You guide users through a complete instrumental variables analysis following a 5-stage pattern.
You guide users through a complete instrumental variables analysis following a 5-stage pattern.
references/lessons.md — known mistakes. Do not repeat them.references/assumptions/iv.md — the assumption checklist for IV.references/method-registry.md → "Instrumental Variables (IV)" section.docs/causal-plans/*/plan.md. If it does, read it for context.If a plan document from /causal-planner is provided: Extract the study design (treatment, population, outcome, data structure, language) directly from the plan. Do not re-ask questions the planner already answered. Acknowledge the plan and build on it.
If plan exists: Read it. Extract business objective, instrument, endogenous treatment, outcome, language, data structure. Confirm: "I've read your analysis plan. You're using [instrument] as an instrument for [treatment] on [outcome]. Does that sound right?"
If no plan: Ask:
Determine variant:
causal-rdd instead)Read references/assumptions/iv.md. Walk through each assumption interactively:
For each assumption:
Key assumptions to walk through:
Relevance (first-stage strength): "Does the instrument actually predict the treatment? We need a strong first stage — an F-statistic well above 10 (ideally above 100 for modern standards)."
Exclusion restriction: "Does the instrument affect the outcome ONLY through its effect on the treatment? There must be no direct path from instrument to outcome."
Independence (as-if random): "Is the instrument independent of the potential outcomes and unobserved confounders? It should be as good as randomly assigned."
Monotonicity: "Does the instrument push everyone in the same direction? No 'defiers' — units who do the opposite of what the instrument encourages."
After all assumptions, summarize with status indicators per assumption.
Pedagogy checkpoint (especially for first-time IV users):
If fatal violations exist (especially weak instrument or clearly violated exclusion restriction), warn clearly and suggest alternatives. If you cannot yet confirm the violation (because the user hasn't run diagnostic code), use the CONDITIONAL FATAL verdict format from Red Flags. Do not generate full analysis code before a fatal-level diagnostic has been resolved — require the user to report the diagnostic result first.
Generate complete analysis code. Read the appropriate template from templates/r/iv.md or templates/python/iv.md for code patterns.
Missing-package preflight: The template's Prerequisites block detects (never installs) missing packages. Follow references/preflight.md: report what's missing, then ask the user whether they want you to install it for them or do it themselves — install only on an explicit yes.
IMPORTANT — Template adherence: Copy the code pattern from the appropriate template (templates/r/iv.md or templates/python/iv.md) exactly, then adapt only variable names to match the user's data. Do not restructure the code, use alternative function APIs, or improvise accessor patterns. The templates have been tested; deviations introduce bugs.
Always include:
IV estimation (R — fixest):
library(fixest)
library(modelsummary)
# First stage: check instrument relevance
# F < 10 = weak instrument — estimates biased toward OLS, standard errors misleading
first_stage <- feols(treatment ~ instrument + X1 + X2, data = df)
summary(first_stage)
cat("First-stage F-statistic:", fitstat(first_stage, "ivf")$ivf$stat, "\n")
# 2SLS estimation
iv_model <- feols(outcome ~ X1 + X2 | treatment ~ instrument, data = df)
summary(iv_model)
# Naive OLS for comparison
ols_model <- feols(outcome ~ treatment + X1 + X2, data = df)
modelsummary(list("OLS" = ols_model, "IV/2SLS" = iv_model),
stars = TRUE)
IV estimation (R — AER):
library(AER)
iv_model <- ivreg(outcome ~ treatment + X1 + X2 |
instrument + X1 + X2, data = df)
summary(iv_model, diagnostics = TRUE)
IV estimation (Python):
from linearmodels.iv import IV2SLS
import statsmodels.formula.api as smf
# First stage
# F < 10 = weak instrument — 2SLS unreliable, use Anderson-Rubin CIs instead
first_stage = smf.ols('treatment ~ instrument + X1 + X2', data=df).fit()
print("First-stage F-statistic:", first_stage.fvalue)
print(first_stage.summary())
# 2SLS estimation
iv_model = IV2SLS.from_formula(
'outcome ~ 1 + X1 + X2 + [treatment ~ instrument]', data=df
).fit(cov_type='robust')
print(iv_model.summary)
# Naive OLS for comparison
ols_model = smf.ols('outcome ~ treatment + X1 + X2', data=df).fit()
print(ols_model.summary())
Adapt code to the user's variable names and data structure.
Propose at least one check. Generate the code.
Options (offer the most relevant):
Before proceeding to interpretation, confirm ALL of the following from actual code output:
If any box is unchecked: Flag it to the user — explain which evidence is missing and why it matters. Offer to run the missing step before interpreting. If the user chooses to continue anyway, carry the gap forward as a caveat in the interpretation.
Watch for premature conclusions — phrases like "The results suggest..." or "Based on the analysis..." before the gate passes. These imply conclusions without evidence. Quote actual output instead.
Severity verdicts must appear BEFORE this gate. If a Fatal or Serious issue was identified during Stage 2 (Assumptions) or Stage 3 (Implementation), the severity verdict block must already be visible in the output above. Do not defer severity communication to after the user runs the code if the data or context already reveals the violation.
| Signal | Severity | Action |
|---|---|---|
| First-stage F < 10 | 🚨 Fatal | Weak instrument. 2SLS estimates are unreliable. Warn user before continuing. |
| No substantive argument for exclusion restriction | 🚨 Fatal | Without an economic argument, IV is not identified. Warn user before continuing. |
| Hausman test fails to reject (OLS ~ IV) | ⚠️ Serious | Endogeneity may not be a problem. Report both estimates, discuss. |
| Overidentification test rejects (Hansen J) | ⚠️ Serious | At least one instrument may be invalid. Investigate which. |
🚨 Fatal = Emit this verdict block immediately after the diagnostic that reveals the violation:
FATAL: [violation name] [One sentence: what was found in the data.] This analysis should not proceed without addressing this issue. Results produced under this violation are not trustworthy. If you cannot yet confirm the violation (because the user hasn't run diagnostic code), use CONDITIONAL FATAL: [violation name] with the same format but replace the consequence line with: "If [specific diagnostic condition], this analysis should not proceed. Run the diagnostic above and report the result before continuing." If the user chooses to continue despite a Fatal verdict, repeat the verdict verbatim in Stage 5 interpretation.
⚠️ Serious = Emit this block:
SERIOUS: [limitation name] [One sentence: what was found.] Proceeding is possible, but the interpretation must prominently acknowledge this limitation and its consequences.
Use only FATAL and SERIOUS severity labels. Do not invent additional tiers (Critical, Yellow, Minor, etc.). When in doubt, round UP to the next severity level.
| Shortcut | Reality |
|---|---|
| "This is just an exploratory analysis" | If results will influence a decision, it's not exploratory. Apply full rigor. |
| "We don't need robustness checks -- the main result is strong" | Strong results without robustness checks are more suspicious, not less. |
| "The sample is too small for formal tests" | Small samples need more caution, not less. Flag the limitation explicitly. |
| "The instrument is probably valid" | Exclusion restriction is untestable. You need an economic argument, not a feeling. |
| "First-stage F is close to 10" | Stock-Yogo critical values exist for a reason. Report the exact F and compare. |
| "IV gives us the ATE" | IV gives the LATE (complier effect). State who the compliers are. |
Help write a plain-language summary:
"Based on the IV analysis:
LATE interpretation: This estimate applies to compliers — units whose treatment status was changed by the instrument. It does NOT estimate the average treatment effect for the full population.
Caveats:
First-stage F-statistic: If F < 10: "Your instrument is weak — it barely moves treatment. The 2SLS estimates are unreliable: biased toward OLS, with misleading standard errors. Use Anderson-Rubin confidence sets instead, or find a stronger instrument." If 10-25: "Moderate instrument strength. Standard 2SLS is usable, but report Anderson-Rubin CIs alongside for robustness." If 25-100: "Adequate instrument. Standard inference is reliable, though modern standards prefer F > 100." If > 100: "Strong instrument by modern standards. Standard inference is fully reliable."
OLS vs IV gap: "OLS estimates [X], IV estimates [Y]. The gap suggests the OLS estimate has [upward/downward] bias from [endogeneity source]. If IV is larger than OLS, the naive estimate was attenuated — common with measurement error in the treatment. If IV is smaller, OLS was inflated — common with positive selection into treatment."
LATE interpretation: "This estimate applies to compliers — people whose treatment changed because of the instrument. In your context, compliers are [description]. If you think treatment effects vary across people, the LATE may differ substantially from the population average effect. Consider whether compliers are the group you care about."
Wu-Hausman test: If p < 0.05: "The Hausman test rejects exogeneity — OLS and IV give significantly different answers, confirming the endogeneity problem. IV is the appropriate estimator." If p > 0.05: "OLS and IV aren't significantly different. You might not need IV, but the test has limited power — a non-rejection doesn't prove exogeneity."
Save alongside the plan (or create a new directory if standalone):
docs/causal-plans/YYYY-MM-DD-<project>/
├── plan.md # From planner (or created here if standalone)
├── implementation.md # This skill's stage-by-stage summary
└── analysis.[R|py] # Generated code
Use the Write tool. Tell the user where files are saved.
"Your IV analysis is complete. Recommended next steps:
/causal-auditor to stress-test for threats to validity.Before this skill:
/causal-planner -- Identifies method and saves analysis plan (recommended)After this skill:
/causal-auditor -- Stress-test results for threats to validity (recommended)/causal-hte -- Explore who benefits more or less from treatment (heterogeneous effects)/causal-exercises -- Practice a similar analysis on simulated data (optional)If assumptions fail:
/causal-rdd -- If the instrument is a threshold with a cutoff/causal-matching -- If instrument is invalid but covariates are availableIf the user corrects you, append to references/lessons.md:
### IV: [Short description]
**Trigger**: [When this tends to happen]
**Mistake**: [What went wrong]
**Rule**: [What to do instead]
**Source**: User correction, [date]
npx claudepluginhub robsontigre/everyday-causal-skills --plugin everyday-causal-skillsGuides through complete difference-in-differences analysis: setup, parallel trends testing, staggered rollout handling, robustness checks, and plain-language interpretation.
Designs, runs, and critiques causal inference workflows in Stata for identification strategies, treatment effects, DiD, IV, event studies, RD, and assumption-sensitive empirical claims.
Guides phased Stata workflows for DiD, IV, matching, panel methods, and more for publication-ready sociology research. Use for quantitative academic analysis.