From skillry-data-ml-ai-engineering
Use when you need to review or clean up Jupyter notebooks — out-of-order cells, hidden state, parameterization, nbconvert/papermill execution, output bloat, and secrets that must stay out of notebooks.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skillry-data-ml-ai-engineering:315-notebook-hygieneThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Review and clean Jupyter notebooks so they are reproducible, reviewable, and safe to commit. Cover execution-order discipline (no out-of-order or stale cells), hidden kernel state, parameterization for automated runs (papermill), headless execution and validation (nbconvert), output and metadata bloat in version control, secrets accidentally embedded in cells or outputs, and extracting reusable...
Review and clean Jupyter notebooks so they are reproducible, reviewable, and safe to commit. Cover execution-order discipline (no out-of-order or stale cells), hidden kernel state, parameterization for automated runs (papermill), headless execution and validation (nbconvert), output and metadata bloat in version control, secrets accidentally embedded in cells or outputs, and extracting reusable logic into importable modules. The goal is a notebook that runs top-to-bottom from a fresh kernel, contains no credentials, and does not poison code review with megabytes of output diffs.
.ipynb file that will be committed or shared..py form — use the ml-training-pipeline-review skill.find . -name "*.ipynb" | grep -v ipynb_checkpoints | head -40
# Is output being committed? Check for outputs/execution_count in tracked notebooks
grep -l '"output_type"' $(find . -name "*.ipynb" | grep -v checkpoints) 2>/dev/null | head -20
# Extract execution_count values; non-monotonic or null indicates out-of-order/stale cells
python3 - <<'PY'
import json, glob
for nb in glob.glob("**/*.ipynb", recursive=True):
if "checkpoint" in nb:
continue
cells = json.load(open(nb)).get("cells", [])
counts = [c.get("execution_count") for c in cells if c.get("cell_type") == "code"]
counts = [c for c in counts if c is not None]
ooo = any(b is not None and a is not None and b < a for a, b in zip(counts, counts[1:]))
if ooo or any(c.get("execution_count") is None for c in cells if c.get("cell_type")=="code"):
print("OUT-OF-ORDER or UNRUN:", nb, counts[:12])
PY
grep -rniE "api[_-]?key|secret|token|password|aws_access|bearer |sk-[a-z0-9]" $(find . -name "*.ipynb" | grep -v checkpoint) | head -25
Any match in source or output is a leak; flag for rotation and removal.
# Headless top-to-bottom execution; fails if any cell errors out of a fresh kernel
jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=600 \
--output /tmp/_check.ipynb path/to/notebook.ipynb
Confirm scheduled notebooks declare a parameters-tagged cell (papermill), and that heavy reusable logic (data loaders, model code) is imported from .py modules rather than copy-pasted across notebooks.
execution_count are stripped before commit (nbstripout or a clean-cell policy).parameters-tagged cell and run via papermill, not manual clicks..py modules, not duplicated across notebooks.requirements/environment reference exists so the kernel is reproducible.Strip outputs and execution counts before commit (nbstripout):
pip install nbstripout
# One-off clean:
nbstripout path/to/notebook.ipynb
# Repo-wide via git filter so outputs never get committed:
nbstripout --install
Parameterized, headless run with papermill (parameters cell tagged parameters):
# --- cell tagged "parameters" ---
start_date = "2026-06-01"
threshold = 0.5
output_path = "out/report.parquet"
# Inject parameters and execute headless; output notebook is an audit artifact.
papermill report.ipynb out/report_2026-06-01.ipynb \
-p start_date 2026-06-01 -p threshold 0.7
Validate reproducibility in CI (nbconvert):
# Fail the build if the notebook does not run cleanly from a fresh kernel.
jupyter nbconvert --to notebook --execute \
--ExecutePreprocessor.timeout=900 \
--output-dir /tmp/nb-ci notebooks/*.ipynb
Pre-commit hook to block secrets and committed outputs:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/kynan/nbstripout
rev: 0.7.1
hooks: [{ id: nbstripout }]
- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks: [{ id: detect-secrets, args: ["--baseline", ".secrets.baseline"] }]
Produce a structured report with:
notebook:cell | issue | risk | concrete fix.**** and flag the credential for rotation.nbstripout --install and a secrets pre-commit hook rather than manual cleanup alone.Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub fluxonlab/skillry --plugin skillry-data-ml-ai-engineering