From grimoire
Designs statistically sound A/B experiments with pre-registration, power analysis, and guardrail metrics for trustworthy causal evidence.
How this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:design-experimentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design statistically sound A/B experiments that produce trustworthy causal evidence for product decisions.
Design statistically sound A/B experiments that produce trustworthy causal evidence for product decisions.
Adopted by: Microsoft (ExP platform, Kohavi's team), Google, Netflix, Airbnb — all maintain internal experimentation platforms Impact: Kohavi et al. report that only 1/3 of A/B experiments at Microsoft produce a positive result; without rigorous experiment design, teams ship features that feel successful but have no causal impact or actively harm metrics.
Why best: A/B testing is the only method that provides causal evidence in product development. Without it, correlation-based decisions (feature launched, DAU went up — success!) fail to account for confounders. Trustworthy experiments require pre-registration of hypotheses, power analysis, and fixed analysis windows.
pwr in R, statsmodels in Python). Do not start the test without sufficient sample size.Hypothesis: Adding social proof ("1,200 users bought this today") to the product page will increase add-to-cart rate by 5%. OEC: Add-to-cart rate. Guardrail: Page load time P95, return rate. MDE: 5%, α=0.05, power=0.80 → required n=8,400 per variant → 2 weeks at current traffic. Result: +3.2% (95% CI: [1.1%, 5.3%], p=0.003). Ship.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireDesigns controlled experiments (A/B, multivariate, quasi) with hypothesis, success metrics, sample size, and statistical power. For validating features via /design-experiment or phrases like 'design experiment'.
A/B test design — produce an experiment spec with hypothesis, primary metric, MDE, sample size, run time, and decision rule. Also determines when NOT to A/B test and what to do instead. Use when asked to "design an A/B test", "should we test this", "experiment design", "how do we know if this works", "what's the sample size", or "set up an experiment".
Designs statistically rigorous A/B tests with hypothesis, sample size, duration, and results interpretation guide. Activates on experiment design or test setup requests.