Use when writing or reviewing any ML code — enforces shape documentation, Hydra configs, reproducible experiment infrastructure, and opinionated coding standards for ML codebases
How this skill is triggered — by the user, by Claude, or both
Slash command
/superpowers-extended-cc:ml-code-standardsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Reproducibility and developer productivity through opinionated conventions. Every experiment must be reproducible from day one — not "later."
Reproducibility and developer productivity through opinionated conventions. Every experiment must be reproducible from day one — not "later."
Core principle: If you can't reproduce it, you didn't do it.
NO EXPERIMENT WITHOUT REPRODUCIBILITY INFRASTRUCTURE FIRST
Set up configs, run scripts, and logging before writing a single training loop.
Always active when writing ML code. No exceptions.
Inline shape comments on every tensor parameter line. One-line docstring max.
```python def attention(q: Tensor, k: Tensor, v: Tensor) -> Tensor: # -> (B, T, D) # q: (B, T, D) k: (B, S, D) v: (B, S, D) B, T, D = q.shape scores = q @ k.transpose(-2, -1) / D**0.5 # (B, T, S) weights = scores.softmax(dim=-1) # (B, T, S) return weights @ v # (B, T, D) ``` Shapes inline, concise, every line ```python def attention(q: Tensor, k: Tensor, v: Tensor) -> Tensor: """Compute scaled dot-product attention.This function takes query, key, and value tensors and computes
multi-head scaled dot-product attention as described in
Vaswani et al. 2017 "Attention Is All You Need."
Args:
q: Query tensor of shape (batch_size, seq_len, d_model)
k: Key tensor of shape (batch_size, src_len, d_model)
v: Value tensor of shape (batch_size, src_len, d_model)
Returns:
Output tensor of shape (batch_size, seq_len, d_model)
"""
scores = q @ k.transpose(-2, -1) / q.shape[-1]**0.5
return scores.softmax(dim=-1) @ v
Docstring longer than the function. Shapes buried in prose.
</Bad>
### Hydra Configs
All configuration via Hydra YAML. Never argparse.
```yaml
# configs/train.yaml
model:
d_model: 512
n_layers: 6
n_heads: 8
dropout: 0.1
trainer:
lr: 3e-4
epochs: 100
batch_size: 64
checkpoint_every: 10
@hydra.main(config_path="../configs", config_name="train", version_base=None)
def main(cfg: DictConfig):
model = build_model(cfg.model)
train(model, cfg.trainer)
CLI overrides: python train.py model.d_model=768 trainer.lr=1e-4
B, T, D = x.shapelearning_rate, checkpoint_dirproject/
configs/ # Hydra configs
train.yaml
model/
base.yaml
large.yaml
src/
model/ # Architecture
data/ # Datasets, transforms, loaders
training/ # Training loop, callbacks
evaluation/ # Metrics, eval scripts
tests/
test_shapes.py
test_gradients.py
test_overfit.py
scripts/
train.py # Entry point
evaluate.py
run_train.sh # Background launch + PID
stop_train.sh # Kill from PID
monitor_train.sh # tail -f logs
status_train.sh # Check running status
runs/ # Auto-created, gitignored
<timestamp>_<name>/
config.yaml
train.log
train.pid
checkpoints/
Every run lives in runs/<timestamp>_<name>/ with:
config.yaml — frozen config snapshottrain.log — stdout/stderr capturetrain.pid — process ID for managementcheckpoints/ — model checkpointsEvery run self-contained. No shared state between runs.
Local instance for experiment tracking:
mlflow server --host 0.0.0.0 --port 5000
Integration pattern:
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("my-project")
with mlflow.start_run(run_name=run_name):
mlflow.log_params(OmegaConf.to_container(cfg, resolve=True))
for epoch in range(cfg.trainer.epochs):
loss = train_epoch(model, loader)
mlflow.log_metric("train_loss", loss, step=epoch)
mlflow.log_artifact(f"{run_dir}/config.yaml")
Refer to MLflow docs for advanced features (model registry, artifact stores, etc.).
run_train.sh:
#!/bin/bash
set -euo pipefail
RUN_DIR="runs/$(date +%Y-%m-%d_%H-%M-%S)_${1:?Usage: run_train.sh <run-name>}"
mkdir -p "$RUN_DIR/checkpoints"
cp configs/train.yaml "$RUN_DIR/config.yaml"
nohup python scripts/train.py hydra.run.dir="$RUN_DIR" "${@:2}" > "$RUN_DIR/train.log" 2>&1 &
echo $! > "$RUN_DIR/train.pid"
echo "Started training: $RUN_DIR"
echo "PID: $(cat $RUN_DIR/train.pid)"
echo "Logs: tail -f $RUN_DIR/train.log"
stop_train.sh:
#!/bin/bash
RUN_DIR="${1:-$(ls -td runs/*/ | head -1)}"
PID_FILE="$RUN_DIR/train.pid"
if [ ! -f "$PID_FILE" ]; then echo "No PID file in $RUN_DIR"; exit 1; fi
PID=$(cat "$PID_FILE")
if kill -0 "$PID" 2>/dev/null; then kill "$PID"; echo "Killed PID $PID ($RUN_DIR)"
else echo "Process $PID not running"; fi
monitor_train.sh:
#!/bin/bash
RUN_DIR="${1:-$(ls -td runs/*/ | head -1)}"
echo "Monitoring: $RUN_DIR"
tail -f "$RUN_DIR/train.log"
status_train.sh:
#!/bin/bash
RUN_DIR="${1:-$(ls -td runs/*/ | head -1)}"
PID=$(cat "$RUN_DIR/train.pid" 2>/dev/null)
if [ -n "$PID" ] && kill -0 "$PID" 2>/dev/null; then
echo "RUNNING (PID $PID) — $RUN_DIR"; tail -3 "$RUN_DIR/train.log"
else echo "STOPPED — $RUN_DIR"; tail -5 "$RUN_DIR/train.log"; fi
Every project must have a concise Quick Start in README with: setup, train, monitor, stop, and override commands.
| Excuse | Reality |
|---|---|
| "I'll organize runs later" | Later never comes. Infra first. |
| "I can just check the terminal" | Terminal is gone after logout. nohup + log capture. |
| "MLflow is overkill for now" | Local MLflow takes 2 minutes to set up. Comparing 10 runs manually takes hours. |
| "I know which config I used" | You don't. 3 days from now you won't remember. Config saved per run. |
| "I'll write the scripts when I need them" | You need them from run 1. Set up once, use forever. |
configs/ directoryAny of these mean: infrastructure is missing. Fix it before continuing.
npx claudepluginhub rishikanthc/ml-superpowers --plugin superpowers-extended-ccAudits ML pipeline reproducibility, experiment tracking hygiene, and model versioning. Advises on serving patterns and prompt evaluation across MLflow, W&B, SageMaker, Vertex AI.
Sets up ML experiment tracking with MLflow or Weights & Biases: installs packages, initializes tools, and provides logging code for parameters, metrics, and artifacts.