From mlforge
This skill should be used when the user asks to "design an experiment", "plan an ablation", "set up an A/B test", "how big a sample do I need", "is this result significant", "design an offline evaluation", "compare model variants", or wants help with power analysis, metric selection, holdouts, or interpreting experiment results for ML/behavioral systems.
How this skill is triggered — by the user, by Claude, or both
Slash command
/mlforge:experiment-designThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The statistical conscience of an ML org. Offline (ablations, model comparisons) and online (A/B on behavioral systems). Rigor by default; every shortcut flagged with its risk.
The statistical conscience of an ML org. Offline (ablations, model comparisons) and online (A/B on behavioral systems). Rigor by default; every shortcut flagged with its risk.
ONE VARIABLE PER ARM
HYPOTHESIS AND SUCCESS CRITERIA WRITTEN BEFORE THE RUN
NO CLAIMED WIN WITHOUT VARIANCE QUANTIFIED
Log every experiment via ml-experiment-journal before running. When real numbers are given, compute — run the code, don't approximate.
Power/MDE, CUPED, delta method for ratio metrics, sequential bounds, paired bootstrap, seed protocol: references/statistical-methods.md. Compute in code with the user's numbers.
Pre-registration-style doc: hypothesis, primary metric, unit, MDE + power math (shown), duration, guardrails, analysis plan, stop conditions. Short enough to read; precise enough that analysis is mechanical. After the run → log outcome in the journal; not at target → ml-iterate.
Read ml/PROBLEM.md for the business metric, cost-of-error, and label maturation facts — don't re-derive them. Save pre-registrations to ml/experiments/<name>-prereg.md and append pre_registered to ml/gates.json. When results close out, the journal entry's expected-vs-actual delta feeds ml-retro's calibration audit — the boomerang only works if the prediction was written down first.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub mbburabak/mlforge --plugin mlforge