From decision-analytics-toolkit
Analyze a dataset to understand it and test claims about it: exploratory data analysis (EDA), summary statistics, distributions, correlation, hypothesis testing, and regression, with clear visualizations. Use this skill whenever someone has a CSV / spreadsheet / table of data and wants to understand it, find patterns or relationships, compare groups, test whether a difference is real or noise, quantify how one variable relates to another, check a hypothesis, or spot data-quality problems. Trigger on phrases like "analyze this data," "what's in this dataset," "is the difference significant," "A/B test," "correlation between," "what predicts," "run a regression," "explore this CSV," "summary stats," or any request to draw evidence-based conclusions from tabular data — even when no specific statistical method is named.
How this skill is triggered — by the user, by Claude, or both
Slash command
/decision-analytics-toolkit:data-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Turn a table of data into understanding and defensible conclusions. This skill covers the
Turn a table of data into understanding and defensible conclusions. This skill covers the analytical arc: get to know the data (EDA), then answer specific questions about it (comparisons, relationships, predictions) with the right statistical tool and honest uncertainty.
The cardinal rule: look before you leap. Most analytical mistakes come from running a test or model before understanding the data's shape, quality, and quirks. Always do exploratory analysis first.
scripts/eda.py to profile the dataset: shape, types,
missingness, summary stats, distributions, outliers, and correlations. Read the profile
before any modeling. This catches the problems that silently invalidate analyses —
missing data, wrong types, duplicates, impossible values, severe skew.EDA is non-negotiable and comes first. scripts/eda.py takes a CSV and produces a profile:
per-column type, missing %, unique counts, summary statistics for numerics, top categories
for categoricals, outlier flags (IQR rule), and a correlation matrix for numeric columns,
plus optional distribution and correlation plots. Use it to:
Never skip to a test or model without reading the EDA profile first.
| Question | Data | Method |
|---|---|---|
| "What's typical / how spread out / what shape?" | one variable | Summary stats + histogram |
| "Are these two groups really different?" | numeric outcome, 2 groups | t-test (or Mann–Whitney if non-normal) |
| "Do these several groups differ?" | numeric outcome, 3+ groups | ANOVA (or Kruskal–Wallis) |
| "Are these two categories associated?" | two categoricals | Chi-square test |
| "How do two numeric variables move together?" | two numerics | Correlation (Pearson / Spearman) |
| "How does X (and others) relate to / predict Y?" | numeric outcome + predictors | Linear regression |
| "What drives a yes/no outcome?" | binary outcome + predictors | Logistic regression |
| "Did the change cause the lift?" (A/B test) | outcome by variant | Two-group test + effect size + interval |
scripts/stats_test.py runs the comparison and association tests (t-test, Mann–Whitney,
ANOVA, Kruskal–Wallis, chi-square, correlation) with assumption checks and effect sizes.
scripts/regression.py fits and diagnoses linear and logistic regression.
A statistical test answers a narrow question: if there were truly no effect, how surprising is data like this? (the p-value). It does not tell you the size or importance of an effect, or that an effect is real. Guard against the usual misreadings:
Regression quantifies how an outcome relates to one or more predictors.
scripts/regression.py fits linear (continuous outcome) or logistic (binary outcome)
models and reports coefficients with confidence intervals, fit (R²/pseudo-R²), and
diagnostics (residual checks, influential points, multicollinearity via VIF). When
interpreting:
Garbage in, garbage out — so audit the data first. The EDA step protects every conclusion downstream. Most "surprising findings" are data errors.
Quantify uncertainty; never imply false precision. Every estimate gets an interval. A point estimate with no error bar invites overconfidence.
Distinguish exploration from confirmation. Patterns found by trawling the data are hypotheses, not results. Finding and "confirming" a pattern in the same dataset overstates certainty — note when a finding is exploratory.
Causation needs more than correlation. Be explicit about what the data design can and can't support. Most datasets license association, not cause.
Make it legible. A clear chart and a plain-language sentence beat a table of coefficients for communicating to a decision-maker. Provide both.
Lead with the answer to the user's question in plain language, with the effect size and its uncertainty, then the supporting detail and a chart. Save figures as files and offer them. For deliverables, offer a written brief (docx) or, when the work is about cleaning/shaping/ summarizing a table, an analyzed spreadsheet (xlsx skill). Always state what was done to the data and the main caveats so the conclusion can be trusted and reproduced.
npx claudepluginhub jdstanhope/decision-analytics-marketplace --plugin decision-analytics-toolkitProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.