From ds
Guides statistical analysis with test selection, assumption checking, power analysis, and APA reporting. Use with /ds:experiment for methodology design, validation, and results.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ds:statistical-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting.
Statistical analysis is a systematic process for testing hypotheses and quantifying relationships. Conduct hypothesis tests (t-test, ANOVA, chi-square), regression, correlation, and Bayesian analyses with assumption checks and APA reporting.
Role in the ds plugin: This skill is invoked by /ds:experiment at step 3 (Methodology Design) for test selection and power analysis, and at step 7 (Generate Results) for assumption checking and APA-formatted reporting. For concrete statsmodels API patterns (model fitting, result object methods, diagnostics, convergence troubleshooting), see the statsmodels skill. This skill focuses on the statistical workflow: selecting the right test, verifying assumptions, interpreting results, and reporting in APA format.
This skill should be used when:
Use this decision tree to determine your analysis path:
START
|
+-- Need to SELECT a statistical test?
| +-- YES -> See "Test Selection Guide"
| +-- NO -> Continue
|
+-- Ready to check ASSUMPTIONS?
| +-- YES -> See "Assumption Checking"
| +-- NO -> Continue
|
+-- Ready to run ANALYSIS?
| +-- YES -> See "Running Statistical Tests"
| +-- NO -> Continue
|
+-- Need to REPORT results?
+-- YES -> See "Reporting Results"
Use references/test_selection_guide.md for comprehensive guidance. Quick reference:
Comparing Two Groups:
Comparing 3+ Groups:
Relationships:
Bayesian Alternatives: All tests have Bayesian versions that provide:
references/bayesian_statistics.mdALWAYS check assumptions before interpreting test results.
Use the provided scripts/assumption_checks.py module for automated checking:
from scripts.assumption_checks import comprehensive_assumption_check
# Comprehensive check with visualizations
results = comprehensive_assumption_check(
data=df,
value_col='score',
group_col='group', # Optional: for group comparisons
alpha=0.05
)
This performs:
For targeted checks, use individual functions:
from scripts.assumption_checks import (
check_normality,
check_normality_per_group,
check_homogeneity_of_variance,
check_linearity,
detect_outliers
)
# Example: Check normality with visualization
result = check_normality(
data=df['score'],
name='Test Score',
alpha=0.05,
plot=True
)
print(result['interpretation'])
print(result['recommendation'])
Normality violated:
Homogeneity of variance violated:
Linearity violated (regression):
See references/assumptions_and_diagnostics.md for comprehensive guidance.
Primary libraries for statistical analysis:
import pingouin as pg
import numpy as np
# Run independent t-test
result = pg.ttest(group_a, group_b, correction='auto')
# Extract results
t_stat = result['T'].values[0]
df = result['dof'].values[0]
p_value = result['p-val'].values[0]
cohens_d = result['cohen-d'].values[0]
ci_lower = result['CI95%'].values[0][0]
ci_upper = result['CI95%'].values[0][1]
# Report
print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}")
print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")
import pingouin as pg
# One-way ANOVA
aov = pg.anova(dv='score', between='group', data=df, detailed=True)
print(aov)
# If significant, conduct post-hoc tests
if aov['p-unc'].values[0] < 0.05:
posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
print(posthoc)
# Effect size
eta_squared = aov['np2'].values[0] # Partial eta-squared
print(f"Partial eta-squared = {eta_squared:.3f}")
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
# Fit model
X = sm.add_constant(X_predictors) # Add intercept
model = sm.OLS(y, X).fit()
# Summary
print(model.summary())
# Check multicollinearity (VIF)
vif_data = pd.DataFrame()
vif_data["Variable"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif_data)
import pymc as pm
import arviz as az
import numpy as np
with pm.Model() as model:
# Priors
mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
sigma = pm.HalfNormal('sigma', sigma=10)
# Likelihood
y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)
# Derived quantity
diff = pm.Deterministic('difference', mu1 - mu2)
# Sample
trace = pm.sample(2000, tune=1000, return_inferencedata=True)
# Summarize
print(az.summary(trace, var_names=['difference']))
# Probability that group1 > group2
prob_greater = np.mean(trace.posterior['difference'].values > 0)
print(f"P(mu1 > mu2 | data) = {prob_greater:.3f}")
Effect sizes quantify magnitude, while p-values only indicate existence of an effect.
See references/effect_sizes_and_power.md for comprehensive guidance.
| Test | Effect Size | Small | Medium | Large |
|---|---|---|---|---|
| T-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| ANOVA | eta-squared | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R-squared | 0.02 | 0.13 | 0.26 |
| Chi-square | Cramer's V | 0.07 | 0.21 | 0.35 |
Most effect sizes are automatically calculated by pingouin:
# T-test returns Cohen's d
result = pg.ttest(x, y)
d = result['cohen-d'].values[0]
# ANOVA returns partial eta-squared
aov = pg.anova(dv='score', between='group', data=df)
eta_p2 = aov['np2'].values[0]
# Correlation: r is already an effect size
corr = pg.corr(x, y)
r = corr['r'].values[0]
Determine required sample size before data collection:
from statsmodels.stats.power import (
tt_ind_solve_power,
FTestAnovaPower
)
# T-test: What n is needed to detect d = 0.5?
n_required = tt_ind_solve_power(
effect_size=0.5,
alpha=0.05,
power=0.80,
ratio=1.0,
alternative='two-sided'
)
print(f"Required n per group: {n_required:.0f}")
# ANOVA: What n is needed to detect f = 0.25?
anova_power = FTestAnovaPower()
n_per_group = anova_power.solve_power(
effect_size=0.25,
ngroups=3,
alpha=0.05,
power=0.80
)
print(f"Required n per group: {n_per_group:.0f}")
Determine what effect size you could detect:
# With n=50 per group, what effect could we detect?
detectable_d = tt_ind_solve_power(
effect_size=None, # Solve for this
nobs1=50,
alpha=0.05,
power=0.80,
ratio=1.0,
alternative='two-sided'
)
print(f"Study could detect d >= {detectable_d:.2f}")
See references/effect_sizes_and_power.md for detailed guidance.
Follow guidelines in references/reporting_standards.md.
Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk:
Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity
of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.
A one-way ANOVA revealed a significant main effect of treatment condition
on test scores, F(2, 147) = 8.45, p < .001, eta-squared = .10. Post hoc
comparisons using Tukey's HSD indicated that Condition A (M = 78.2,
SD = 7.3) scored significantly higher than Condition B (M = 71.5,
SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9,
p < .001, d = 1.07). Conditions B and C did not differ significantly
(p = .52, d = 0.18).
Multiple linear regression was conducted to predict exam scores from
study hours, prior GPA, and attendance. The overall model was significant,
F(3, 146) = 45.2, p < .001, R-squared = .48, adjusted R-squared = .47.
Study hours (B = 1.80, SE = 0.31, beta = .35, t = 5.78, p < .001,
95% CI [1.18, 2.42]) and prior GPA (B = 8.52, SE = 1.95, beta = .28,
t = 4.37, p < .001, 95% CI [4.66, 12.38]) were significant predictors,
while attendance was not (B = 0.15, SE = 0.12, beta = .08, t = 1.25,
p = .21, 95% CI [-0.09, 0.39]). Multicollinearity was not a concern
(all VIF < 1.5).
A Bayesian independent samples t-test was conducted using weakly
informative priors (Normal(0, 1) for mean difference). The posterior
distribution indicated that Group A scored higher than Group B
(M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor
BF10 = 45.3 provided very strong evidence for a difference between
groups, with a 99.8% posterior probability that Group A's mean exceeded
Group B's mean.
Consider Bayesian approaches when:
See references/bayesian_statistics.md for comprehensive guidance.
comprehensive_assumption_check(): Complete workflowcheck_normality(): Normality testing with Q-Q plotscheck_homogeneity_of_variance(): Levene's test with box plotscheck_linearity(): Regression linearity checksdetect_outliers(): IQR and z-score outlier detectionnpx claudepluginhub andikarachman/data-science-plugin --plugin dsGuides statistical analysis with test selection, assumption checking, power analysis, and APA-formatted reporting. Use for academic research or when you need help choosing appropriate tests.
Guided statistical analysis with hypothesis-test selection, assumption checking, power analysis, and APA-formatted reporting.
Guides statistical test selection, assumption checks, effect sizes, power analysis, and APA reporting for frequentist (t-test, ANOVA, chi-square, regression) and Bayesian analyses using statsmodels or pymc.