Skill

ml-code-standards

Use when writing or reviewing any ML code — enforces shape documentation, Hydra configs, reproducible experiment infrastructure, and opinionated coding standards for ML codebases

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/superpowers-extended-cc:ml-code-standards

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Reproducibility and developer productivity through opinionated conventions. Every experiment must be reproducible from day one — not "later."

SKILL.md

253 lines · ~1.8k tokens

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitFeb 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

ML Code Standards

Overview

Reproducibility and developer productivity through opinionated conventions. Every experiment must be reproducible from day one — not "later."

Core principle: If you can't reproduce it, you didn't do it.

The Iron Law

NO EXPERIMENT WITHOUT REPRODUCIBILITY INFRASTRUCTURE FIRST

Set up configs, run scripts, and logging before writing a single training loop.

When to Use

Always active when writing ML code. No exceptions.

Opinionated Code Style

Shape Documentation

Inline shape comments on every tensor parameter line. One-line docstring max.

```python def attention(q: Tensor, k: Tensor, v: Tensor) -> Tensor: # -> (B, T, D) # q: (B, T, D) k: (B, S, D) v: (B, S, D) B, T, D = q.shape scores = q @ k.transpose(-2, -1) / D**0.5 # (B, T, S) weights = scores.softmax(dim=-1) # (B, T, S) return weights @ v # (B, T, D) ``` Shapes inline, concise, every line ```python def attention(q: Tensor, k: Tensor, v: Tensor) -> Tensor: """Compute scaled dot-product attention.

This function takes query, key, and value tensors and computes
multi-head scaled dot-product attention as described in
Vaswani et al. 2017 "Attention Is All You Need."

Args:
    q: Query tensor of shape (batch_size, seq_len, d_model)
    k: Key tensor of shape (batch_size, src_len, d_model)
    v: Value tensor of shape (batch_size, src_len, d_model)

Returns:
    Output tensor of shape (batch_size, seq_len, d_model)
"""
scores = q @ k.transpose(-2, -1) / q.shape[-1]**0.5
return scores.softmax(dim=-1) @ v

Docstring longer than the function. Shapes buried in prose.
</Bad>

### Hydra Configs

All configuration via Hydra YAML. Never argparse.

```yaml
# configs/train.yaml
model:
  d_model: 512
  n_layers: 6
  n_heads: 8
  dropout: 0.1

trainer:
  lr: 3e-4
  epochs: 100
  batch_size: 64
  checkpoint_every: 10

@hydra.main(config_path="../configs", config_name="train", version_base=None)
def main(cfg: DictConfig):
    model = build_model(cfg.model)
    train(model, cfg.trainer)

CLI overrides: python train.py model.d_model=768 trainer.lr=1e-4

```python parser = argparse.ArgumentParser() parser.add_argument('--d-model', type=int, default=512) parser.add_argument('--lr', type=float, default=3e-4) # 40 more lines of this ``` No composition, no saved configs, no override syntax.

Variable Naming

Short for math: B, T, D = x.shape
Long for infra: learning_rate, checkpoint_dir
No redundant comments on obvious code
No type hints on local math vars (noise, not signal)

Project Structure

project/
  configs/             # Hydra configs
    train.yaml
    model/
      base.yaml
      large.yaml
  src/
    model/             # Architecture
    data/              # Datasets, transforms, loaders
    training/          # Training loop, callbacks
    evaluation/        # Metrics, eval scripts
  tests/
    test_shapes.py
    test_gradients.py
    test_overfit.py
  scripts/
    train.py           # Entry point
    evaluate.py
    run_train.sh        # Background launch + PID
    stop_train.sh       # Kill from PID
    monitor_train.sh    # tail -f logs
    status_train.sh     # Check running status
  runs/                # Auto-created, gitignored
    <timestamp>_<name>/
      config.yaml
      train.log
      train.pid
      checkpoints/

Reproducible Experiment Infrastructure

Run Organization

Every run lives in runs/<timestamp>_<name>/ with:

config.yaml — frozen config snapshot
train.log — stdout/stderr capture
train.pid — process ID for management
checkpoints/ — model checkpoints

Every run self-contained. No shared state between runs.

MLflow Integration

Local instance for experiment tracking:

mlflow server --host 0.0.0.0 --port 5000

Integration pattern:

import mlflow

mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("my-project")

with mlflow.start_run(run_name=run_name):
    mlflow.log_params(OmegaConf.to_container(cfg, resolve=True))
    for epoch in range(cfg.trainer.epochs):
        loss = train_epoch(model, loader)
        mlflow.log_metric("train_loss", loss, step=epoch)
    mlflow.log_artifact(f"{run_dir}/config.yaml")

Refer to MLflow docs for advanced features (model registry, artifact stores, etc.).

Background Training Scripts

run_train.sh:

#!/bin/bash
set -euo pipefail
RUN_DIR="runs/$(date +%Y-%m-%d_%H-%M-%S)_${1:?Usage: run_train.sh <run-name>}"
mkdir -p "$RUN_DIR/checkpoints"
cp configs/train.yaml "$RUN_DIR/config.yaml"
nohup python scripts/train.py hydra.run.dir="$RUN_DIR" "${@:2}" > "$RUN_DIR/train.log" 2>&1 &
echo $! > "$RUN_DIR/train.pid"
echo "Started training: $RUN_DIR"
echo "PID: $(cat $RUN_DIR/train.pid)"
echo "Logs: tail -f $RUN_DIR/train.log"

stop_train.sh:

#!/bin/bash
RUN_DIR="${1:-$(ls -td runs/*/ | head -1)}"
PID_FILE="$RUN_DIR/train.pid"
if [ ! -f "$PID_FILE" ]; then echo "No PID file in $RUN_DIR"; exit 1; fi
PID=$(cat "$PID_FILE")
if kill -0 "$PID" 2>/dev/null; then kill "$PID"; echo "Killed PID $PID ($RUN_DIR)"
else echo "Process $PID not running"; fi

monitor_train.sh:

#!/bin/bash
RUN_DIR="${1:-$(ls -td runs/*/ | head -1)}"
echo "Monitoring: $RUN_DIR"
tail -f "$RUN_DIR/train.log"

status_train.sh:

#!/bin/bash
RUN_DIR="${1:-$(ls -td runs/*/ | head -1)}"
PID=$(cat "$RUN_DIR/train.pid" 2>/dev/null)
if [ -n "$PID" ] && kill -0 "$PID" 2>/dev/null; then
    echo "RUNNING (PID $PID) — $RUN_DIR"; tail -3 "$RUN_DIR/train.log"
else echo "STOPPED — $RUN_DIR"; tail -5 "$RUN_DIR/train.log"; fi

Run Instructions Requirement

Every project must have a concise Quick Start in README with: setup, train, monitor, stop, and override commands.

Common Rationalizations

Excuse	Reality
"I'll organize runs later"	Later never comes. Infra first.
"I can just check the terminal"	Terminal is gone after logout. nohup + log capture.
"MLflow is overkill for now"	Local MLflow takes 2 minutes to set up. Comparing 10 runs manually takes hours.
"I know which config I used"	You don't. 3 days from now you won't remember. Config saved per run.
"I'll write the scripts when I need them"	You need them from run 1. Set up once, use forever.

Red Flags — STOP and Fix

Functions touching tensors without shape comments
argparse in new ML code
Docstrings longer than the function body
Magic numbers without config (hardcoded 512, 0.001, etc.)
No configs/ directory
Any training run without checkpointing enabled
Training started in foreground terminal (use nohup scripts)

Any of these mean: infrastructure is missing. Fix it before continuing.

ml-code-standards

Invocation

Context Preview

SKILL.md

ml-code-standards

Invocation

Context Preview

SKILL.md

ML Code Standards

Overview

The Iron Law

When to Use

Opinionated Code Style

Shape Documentation

Variable Naming

Project Structure

Reproducible Experiment Infrastructure

Run Organization

MLflow Integration

Background Training Scripts

Run Instructions Requirement

Common Rationalizations

Red Flags — STOP and Fix

Similar Skills

ML Code Standards

Overview

The Iron Law

When to Use

Opinionated Code Style

Shape Documentation

Variable Naming

Project Structure

Reproducible Experiment Infrastructure

Run Organization

MLflow Integration

Background Training Scripts

Run Instructions Requirement

Common Rationalizations

Red Flags — STOP and Fix

Similar Skills