From science-superpowers
Creates concrete analysis plans from approved research questions, covering model specs, confounds, power, and pipeline structure. Use before touching outcome data or fitting models.
How this skill is triggered — by the user, by Claude, or both
Slash command
/science-superpowers:designing-the-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Write a comprehensive analysis plan assuming the analyst has zero context for this project and questionable statistical taste. Document everything they need: which datasets and variables, how each construct is computed, the exact model or test, the sample size / power justification, which confounds are handled and how, the decision rules, and the planned figures. Give them the whole thing as bi...
Write a comprehensive analysis plan assuming the analyst has zero context for this project and questionable statistical taste. Document everything they need: which datasets and variables, how each construct is computed, the exact model or test, the sample size / power justification, which confounds are handled and how, the decision rules, and the planned figures. Give them the whole thing as bite-sized steps. DRY. YAGNI. Pre-register. Validate pipelines on known data. Commit frequently.
Assume they are a capable programmer but know almost nothing about this domain, this dataset, or good statistical design.
Announce at start: "I'm using the designing-the-analysis skill to create the analysis plan."
Save plans to: docs/science-superpowers/plans/YYYY-MM-DD-<topic>.md
If the question document still bundles several independent investigations, stop and split it — one plan per question. Each plan should produce an interpretable, self-contained result.
docs/science-superpowers/questions/...)If you don't have a prior effect size, the design must include how you will justify the sample size anyway (smallest effect of interest, precision target, or a sensitivity analysis).
Before defining steps, map the pipeline. Data flows one direction: raw → cleaned → derived → results.
For each known confound from the survey, state how it is handled: measured and adjusted for, stratified, matched, design-excluded, or explicitly acknowledged as a limitation. "We'll see" is not a plan.
State the threats to validity you are accepting and why.
State the target effect size, alpha, desired power, and the resulting required N — or, for a fixed existing sample, the minimum detectable effect at the planned power. If underpowered, say so and decide with your human partner whether to proceed (e.g., reframe as estimation, not a hypothesis test).
Each step is one action (2-5 minutes). Every step that touches data is paired with a validation — the science analog of watching a test fail then pass:
The simulated-data validation step is mandatory for any nontrivial estimator or model: if you never watched your pipeline recover a known signal, you don't know it works.
Every plan MUST start with this header:
# [Question] Analysis Plan
> **For agentic workers:** REQUIRED SUB-SKILL: pre-register this plan with science-superpowers:preregistering-analysis BEFORE execution. Then use science-superpowers:subagent-driven-analysis (recommended) or science-superpowers:executing-analysis to run it step-by-step. Steps use checkbox (`- [ ]`) syntax for tracking.
**Question:** [the falsifiable question, one sentence]
**Design:** [observational/experimental; cross-sectional/longitudinal; the comparison]
**Data:** [datasets, sample, unit of analysis]
**Primary analysis:** [the one model/test that answers the question]
**Decision rule:** [exactly what result confirms vs. disconfirms H1]
---
### Task N: [Analysis component]
**Artifacts:**
- Create: `analysis/exact_script.py`
- Reads: `data/raw/exact_file.csv` (immutable)
- Writes: `data/derived/exact_output.parquet`
- [ ] **Step 1: Write the loading/transform code**
```python
df = pd.read_csv("data/raw/exact_file.csv")
clean = df[df["value"].between(0, 100)]
```
- [ ] **Step 2: Validate the step**
Run: `python analysis/exact_script.py --check`
Expected: `rows in: 10342, rows out: 10298, dropped: 44 (out-of-range)` — dropped count matches the known data-quality issue, not silent loss.
- [ ] **Step 3: Run the primary model exactly as specified**
```python
model = smf.ols("outcome ~ exposure + age + site", data=clean).fit()
```
- [ ] **Step 4: Apply the pre-registered decision rule**
The estimate for `exposure` is interpreted against the rule fixed in the pre-registration — not re-decided here.
- [ ] **Step 5: Commit**
```bash
git add analysis/exact_script.py data/derived/exact_output.parquet
git commit -m "analysis: primary model for exposure effect"
```
Every step must contain the actual content the analyst needs. These are plan failures — never write them:
After writing the plan, re-read the question document with fresh eyes and check the plan against it:
Fix issues inline. If a question requirement has no task, add the task.
The plan is not ready to execute until its predictions and decision rules are locked.
REQUIRED NEXT SKILL: Use science-superpowers:preregistering-analysis to freeze the confirmatory hypotheses, predictions, and decision rules before any outcome is observed. Execution happens only after that.
npx claudepluginhub k-dense-ai/science-superpowers --plugin science-superpowersGenerates an executable empirical analysis plan from study_spec.md, audit report, and cleaned data structure. Outputs analysis_plan.md for human approval before analysis execution.
Guides framing fuzzy research questions into precise, falsifiable investigations before any data is loaded or analyzed. Enforces a hard gate to prevent confirmatory contamination.