From compound-science
Guides Bayesian estimation in quantitative social science: prior specification, MCMC with Stan/PyMC/NumPyro, convergence diagnostics (R-hat, ESS), hierarchical models, and posterior summaries.
How this skill is triggered — by the user, by Claude, or both
Slash command
/compound-science:bayesian-estimationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reference for Bayesian estimation in quantitative social science: from prior elicitation to MCMC implementation to posterior reporting. Covers the full workflow of specifying a Bayesian model, running inference, diagnosing convergence, and communicating results — with applications to structural models, hierarchical designs, and small-sample settings.
Reference for Bayesian estimation in quantitative social science: from prior elicitation to MCMC implementation to posterior reporting. Covers the full workflow of specifying a Bayesian model, running inference, diagnosing convergence, and communicating results — with applications to structural models, hierarchical designs, and small-sample settings.
Use when the user is:
Skip when:
structural-modeling skill)causal-inference skill)Small samples: Priors act as regularization. With N < 100 observations and several parameters, MLE can overfit or fail to converge. A weakly informative prior is often equivalent to several additional observations of prior knowledge.
Hierarchical structure: When data have natural groupings (markets, countries, firms), Bayesian partial pooling is more efficient than either pooling all groups (ignores variation) or fitting each group separately (ignores shared structure). Random effects Bayesian models borrow strength across groups.
Uncertainty propagation: The posterior is a full distribution. Downstream quantities (elasticities, welfare changes, counterfactuals) inherit full uncertainty without a delta method approximation.
Natural model comparison: Posterior predictive checks and LOO-CV are cleaner than frequentist test-based model selection, especially for non-nested models.
Constrained parameters: Parameters with domain restrictions (discount factors in [0,1], positive variances, simplex probability vectors) are handled naturally via transformed parameter blocks and appropriate priors.
| Scenario | Bayesian | Frequentist |
|---|---|---|
| N = 50, 10 parameters | Preferred — priors regularize | MLE may not converge |
| N = 100,000, 5 parameters | Either works; MLE faster | Preferred — priors irrelevant |
| Hierarchical / multi-group | Preferred — partial pooling | Mixed effects via ML is comparable |
| Uncertainty in counterfactuals | Preferred — natural propagation | Delta method or bootstrap |
| Structural model, large state space | Difficult — MCMC over full model | Preferred — NFXP/MPEC more tractable |
| Non-standard likelihood | Either — depends on differentiability | GMM often more flexible |
| Model comparison, non-nested | LOO-CV / WAIC | AIC/BIC, Vuong test |
| Publication: applied micro top-5 | Use sparingly; justify carefully | Standard expectation |
The core task: encode genuine prior knowledge without overwhelming the likelihood.
These encode vague knowledge — parameters are unlikely to be astronomically large — without strongly influencing the posterior when data are informative. These follow Gelman et al. (2008, 2017) recommendations:
| Parameter type | Recommended prior | Rationale |
|---|---|---|
| Location / intercept (raw scale) | Normal(0, 10) | Very diffuse; most applications won't have effects > 10 SDs |
| Location (standardized predictors) | Normal(0, 2.5) | Rules out extreme effects; standard for logistic regression |
| Scale / variance | HalfNormal(1) or Exponential(1) | Positive, concentrates near zero but with wide right tail |
| Log-scale parameters (elasticities) | Normal(0, 1) on log scale | Implies elasticity plausibly between 0.14 and 7.4 |
| Correlation matrices | LKJ(2) | Shrinks toward identity; LKJ(1) is uniform on correlations |
| Simplex (probability vectors) | Dirichlet(1, ..., 1) | Uniform over simplex |
When the literature provides benchmark values, use them as prior means with SD reflecting plausible variation:
methods-explorer agent to find reference parameter values# Example: Informative prior on price elasticity from literature
# Literature: price elasticity typically -0.5 to -2.0 (mean around -1.2)
import pymc as pm
with pm.Model() as demand_model:
alpha = pm.Normal("alpha", mu=-1.2, sigma=0.4) # price elasticity
beta_inc = pm.Normal("beta_inc", mu=0, sigma=1.0) # income elasticity
sigma = pm.HalfNormal("sigma", sigma=1.0)
Prior predictive decision rules:
Prior sensitivity analysis: After estimation, refit with prior SD at 0.5x and 2x. If the posterior mean shifts by more than 0.5 SD, the prior is informative — report sensitivity results.
Key thresholds for diagnosing MCMC convergence. For full code and detailed guidance, see references/diagnostics-guide.md.
| Diagnostic | Good | Borderline | Bad | Action if Bad |
|---|---|---|---|---|
| R-hat | < 1.01 | 1.01–1.05 | >= 1.05 | Run longer chains; reparametrize |
| Bulk ESS | > 400 | 200–400 | < 200 | Increase draws; reduce autocorrelation |
| Tail ESS | > 1000 | 400–1000 | < 400 | Increase draws; focus on scale parameters |
| Divergences | 0 | — | Any | Raise target_accept to 0.9–0.95; non-centered parametrization |
| BFMI | > 0.3 | 0.2–0.3 | < 0.2 | Reparametrize; check prior scales |
Convergence checklist (run before every reported result):
import arviz as az
summary = az.summary(idata, var_names=["beta", "sigma"])
# Check: r_hat < 1.01, ess_bulk > 400, ess_tail > 400
divergences = idata.sample_stats["diverging"].values.sum()
assert divergences == 0, f"{divergences} divergences — fix before reporting"
For full diagnostic code including trace plots, pair plots, and BFMI computation, see references/diagnostics-guide.md.
A minimal PyMC example for hierarchical demand. For full Stan, PyMC, NumPyro, and brms examples (including Bayesian IV, reparametrization patterns, and Cholesky covariance), see references/implementation.md.
import pymc as pm
import arviz as az
with pm.Model() as hierarchical_demand:
# Hyperpriors
mu_beta = pm.Normal("mu_beta", mu=-1.0, sigma=0.5)
sigma_beta = pm.HalfNormal("sigma_beta", sigma=0.3)
# Non-centered parametrization — always use this for hierarchical models
beta_raw = pm.Normal("beta_raw", mu=0, sigma=1, shape=n_markets)
beta = pm.Deterministic("beta", mu_beta + sigma_beta * beta_raw)
sigma = pm.HalfNormal("sigma", sigma=1.0)
mu = beta[market_idx] * log_price
log_quantity = pm.Normal("log_quantity", mu=mu, sigma=sigma, observed=Y_obs)
trace = pm.sample(
draws=2000, tune=1000, chains=4,
target_accept=0.9, # raise if divergences appear
random_seed=42, return_inferencedata=True
)
# Always run diagnostics immediately after sampling
print(az.summary(trace, var_names=["beta", "mu_beta", "sigma_beta"]))
Framework selection guide:
| Framework | Best for |
|---|---|
| Stan (cmdstanpy) | Complex custom models, structural work, production code |
| PyMC | Python-native workflows, hierarchical models, rapid iteration |
| NumPyro | Large models, GPU acceleration, JAX integration |
| brms / rstanarm | R users, standard hierarchical families, formula interface |
For Bayesian structural models (BLP, dynamic discrete choice, hierarchical DiD) and reparametrization strategies, see references/structural-models.md.
Agents to invoke alongside Bayesian estimation:
numerical-auditor: Review MCMC convergence diagnostics — R-hat, ESS, divergences. Report format should include all five convergence metrics.econometric-reviewer: Review prior elicitation strategy, sensitivity analysis, and whether priors are consistent with identification. Use for prior predictive checks and moment-matching to literature targets.methods-explorer: Find literature calibration targets to set informative prior means. Ask for point estimates and uncertainty ranges, not just means.econometric-reviewer: Verify that reported posterior means, credible intervals, and model comparison statistics match the actual ArviZ/Stan output.Related skills:
structural-modeling: Frequentist counterpart — NFXP, MPEC, BLP, dynamic discrete choice. Use Bayesian skills on top of the structural model framework when small samples or hierarchical structure warrants it.causal-inference: For reduced-form causal methods. Bayesian DiD and RD designs follow the same identification logic; the Bayesian layer adds partial pooling and uncertainty propagation.Extensions to Bayesian context:
empirical-playbook skill (diagnostic-battery.md): Convergence diagnostics (R-hat, ESS, divergences) are a subset of the full diagnostic batterynumerical-auditor agent: Prior predictive simulation is a special case of the Monte Carlo simulation workflowempirical-playbook skill (sensitivity-analysis.md): Prior sensitivity analysis (vary prior SD 0.5x and 2x) is a natural robustness check for Bayesian models| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| Reporting results without checking R-hat | Non-convergence masquerades as valid posterior | Always run full diagnostic checklist first |
| Using uniform priors (flat priors) | Improper or extremely diffuse; creates pathological geometry in hierarchical models | Use weakly informative priors (Normal(0, 2.5) or HalfNormal(1)) |
| Centered parametrization for hierarchical models | Funnel geometry; divergences and low ESS for scale parameters | Non-centered parametrization always for hierarchical models |
| Using variational inference for final results | Underestimates posterior variance; may miss multimodality | MCMC for final results; VI only for exploration |
| Skipping prior predictive check | Priors may imply scientifically impossible data | Always run prior predictive before fitting |
| Single chain | Cannot compute R-hat; cannot detect non-convergence | Always run 4 chains |
| Treating credible interval as confidence interval | Different interpretation | Report as "90% credible interval" and be precise |
| Bayes factors for model comparison | Extremely sensitive to prior specification; computationally unstable | Use LOO-CV (PSIS-LOO) via ArviZ instead |
| Ignoring Pareto k diagnostics | LOO-CV unreliable for high-k observations | Check loo.pareto_k > 0.7; use K-fold CV for problematic observations |
npx claudepluginhub james-traina/science-plugins --plugin compound-scienceBuilds and fits Bayesian models using PyMC: hierarchical models, MCMC (NUTS), variational inference, LOO/WAIC comparison, posterior checks for probabilistic programming.
Build and validate Bayesian models with PyMC: hierarchical models, MCMC sampling (NUTS), variational inference, model comparison (LOO/WAIC), and posterior predictive checks.
Bayesian modeling with PyMC: hierarchical models, MCMC (NUTS), variational inference, LOO/WAIC comparison, posterior predictive checks. Use for fitting Bayesian models and estimating posteriors.