From ds
Verifies ML experiment reproducibility: checks random seeds, library versions, data hashes, git commits, environment files, and result determinism. Use when reviewing before shipping.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ds:reproducibility-checklistThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Verify that an ML experiment can be reproduced by another person on another machine. Walk through each requirement and score the experiment.
Verify that an ML experiment can be reproduced by another person on another machine. Walk through each requirement and score the experiment.
Verify all sources of randomness are controlled:
random_state parameter or np.random.seed())Check: Search the experiment code for random_state, seed, random.seed, np.random.seed, torch.manual_seed. Every stochastic call should have a fixed seed.
Verify all library versions are captured:
Check: Look for an "Environment" section in the experiment result. Compare library versions against what was planned.
Verify the exact dataset can be retrieved:
Check: Look for data_hash, SHA-256, or a data snapshot reference in the experiment artifacts. Verify the hash matches the actual file if available.
Verify the exact code state can be recovered:
Check: Look for git_commit or git SHA in the experiment result.
Verify the environment can be recreated:
Check: Look for an "Environment" section. If no requirements file exists, flag as a gap.
Verify results are deterministic:
Note: Full re-run verification is optional. Flag if the experiment uses known non-deterministic operations (GPU training without torch.use_deterministic_algorithms(), multi-threaded data loading).
Count the number of checked items across all 6 sections:
| Score | Rating | Recommendation |
|---|---|---|
| 16-17 / 17 | Excellent | Ready to ship |
| 12-15 / 17 | Good | Minor gaps -- document and proceed |
| 8-11 / 17 | Fair | Significant gaps -- fix before shipping |
| 0-7 / 17 | Poor | Not reproducible -- requires rework |
import hashlib
import sys
import subprocess
import importlib
def verify_reproducibility(data_path=None, expected_hash=None):
"""Quick reproducibility verification."""
report = {}
# Python version
report['python'] = sys.version.split()[0]
# Git commit
try:
sha = subprocess.getoutput("git rev-parse HEAD").strip()
dirty = subprocess.getoutput("git status --porcelain").strip()
report['git_commit'] = sha
report['git_clean'] = len(dirty) == 0
except Exception:
report['git_commit'] = 'unavailable'
report['git_clean'] = False
# Library versions
libs = ['pandas', 'numpy', 'sklearn', 'scipy', 'statsmodels',
'aeon', 'xgboost', 'lightgbm', 'matplotlib']
report['libraries'] = {}
for lib in libs:
try:
mod = importlib.import_module(lib)
report['libraries'][lib] = getattr(mod, '__version__', 'installed')
except ImportError:
pass
# Data hash
if data_path:
h = hashlib.sha256()
with open(data_path, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
h.update(chunk)
report['data_hash'] = h.hexdigest()
if expected_hash:
report['data_hash_match'] = report['data_hash'] == expected_hash
return report
| Failure | Cause | Fix |
|---|---|---|
| Different metrics on re-run | Missing random seed in data split or model | Pass random_state to all stochastic calls |
| Can't install same libraries | No pinned versions | Use pip freeze > requirements.txt at experiment time |
| Data changed between runs | No data hash captured | Hash data files before training |
| Code changed since experiment | No git SHA recorded | Record git rev-parse HEAD in experiment log |
| GPU gives different results | Non-deterministic CUDA operations | Document GPU non-determinism or use torch.use_deterministic_algorithms(True) |
npx claudepluginhub andikarachman/data-science-plugin --plugin dsSets up an isolated, reproducible workspace with pinned environment, fixed seeds, and immutable raw data before running analysis.
Provides Markdown template and Python utilities for logging ML experiments with hypothesis, configs, results, environment, and decisions for reproducibility. Use when running ML experiments.
Prepares research code repositories for open-source release by auditing sensitive content, ensuring reproducibility via checklists, suggesting standard structures, and generating publication-ready READMEs.