From ab-testing
Generate a structured, stakeholder-ready experiment report from A/B test results. Supports technical and executive audiences.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ab-testing:experiment-reportThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate a complete experiment report. Use context from the conversation (prior analysis, experiment design) or gather the needed information from `$ARGUMENTS` and follow-up questions.
Generate a complete experiment report. Use context from the conversation (prior analysis, experiment design) or gather the needed information from $ARGUMENTS and follow-up questions.
Collect before generating:
Ask the user for their target audience:
Generate a Markdown report with these sections:
2-3 sentences: what was tested, what happened, what the recommendation is. This should be understandable by anyone without reading further.
Primary Metric:
Secondary Metrics: Present in a table:
| Metric | Control | Treatment | Diff (%) | 95% CI | p-value | Significant? |
|---|
Guardrail Metrics:
| Metric | Control | Treatment | Status |
|---|---|---|---|
| For each guardrail, mark PASS (no degradation) or FAIL (significant degradation detected). |
If segment data is available, break down the primary metric by key dimensions:
Flag any segments where results diverge significantly from the overall result.
One of:
Provide clear reasoning linking the data to the recommendation.
npx claudepluginhub weisberg/agile_agentic_analytics --plugin ab-testingSummarizes A/B test results, declares a winner or inconclusive, and drafts stakeholder recommendations. Use after experiments complete for analysis and ship/kill decisions.
Documents results of experiments or A/B tests with statistical analysis, learnings, and recommendations. Use after experiments conclude to communicate findings and inform decisions.
Analyzes A/B test result CSV/table data and outputs PM-ready report with conclusion, results table, guardrail checks, bias/novelty warnings, and ship/iterate/kill recommendation. Always checks statistical significance vs. business meaning, guardrail violations, and p-hacking signals.