Tracks AI/ML model versions using MLflow: logs hyperparameters/metrics, registers models, manages Staging/Production stages, compares performance, generates model cards.
How this skill is triggered — by the user, by Claude, or both
Slash command
/model-versioning-tracker:tracking-model-versionsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Track and manage AI/ML model versions using MLflow, DVC, or Weights & Biases. Log model metadata (hyperparameters, training data hash, framework version), record evaluation metrics (accuracy, F1, latency), manage model registry transitions (Staging, Production, Archived), and generate model cards documenting lineage and performance.
Track and manage AI/ML model versions using MLflow, DVC, or Weights & Biases. Log model metadata (hyperparameters, training data hash, framework version), record evaluation metrics (accuracy, F1, latency), manage model registry transitions (Staging, Production, Archived), and generate model cards documenting lineage and performance.
mlflow server or managed MLflow)mlflow, pandas, and the relevant ML framework installedMLFLOW_TRACKING_URI and verify connectivity with mlflow experiments list.mlflow experiments create --experiment-name <name>.mlflow.<flavor>.log_model().mlflow.register_model() with the run URI and a descriptive model name.None -> Staging -> Production using client.transition_model_version_stage(). Archive previous production versions.mlflow.search_runs() and generating comparison tables showing metric improvements between versions.${CLAUDE_SKILL_DIR}/assets/model_card_template.md.See ${CLAUDE_SKILL_DIR}/assets/example_mlflow_workflow.yaml for a complete workflow configuration.
Tracking a new image classification model version: Log a ResNet-50 fine-tuned on a custom dataset. Record hyperparameters (lr=0.001, epochs=50, optimizer=Adam), metrics (val_accuracy=0.94, val_loss=0.18, inference_latency_ms=12), and the serialized model artifact. Register as version 3 in the model registry and transition to Staging for validation.
Comparing model versions before production promotion: Query MLflow for all versions of the sentiment-analysis model. Generate a comparison table showing accuracy improved from 0.87 (v2) to 0.91 (v3) while inference latency increased from 8ms to 15ms. Recommend promoting v3 to Production only if latency is acceptable for the use case.
Generating a model card for compliance review: Extract metadata from MLflow model registry version 5: training dataset (100K customer reviews), evaluation results (F1=0.89 on held-out test set), known limitations (struggles with sarcasm and multilingual input), and intended use (customer feedback classification). Output a structured Markdown model card.
| Error | Cause | Solution |
|---|---|---|
| MLflow connection refused | Tracking server not running or wrong URI | Verify MLFLOW_TRACKING_URI is correct; start server with mlflow server --host 0.0.0.0 --port 5000 |
| Artifact upload failed | Insufficient permissions on artifact store | Check S3/GCS bucket permissions; verify IAM role has write access to the artifact path |
| Model registration conflict | Model name already exists with incompatible schema | Use a versioned model name or delete the conflicting registry entry |
| Metrics not logged | MLflow run ended before logging completed | Ensure all log_metric() calls happen within the active run context (with mlflow.start_run():) |
| Stage transition denied | Model version already in target stage | Archive the existing version in that stage first, then retry the transition |
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin model-versioning-trackerRegisters trained models in MLflow Model Registry with version control, stage transitions, approval workflows, and lineage tracking for production promotion and governance.
Audits ML pipeline reproducibility, experiment tracking hygiene, and model versioning. Advises on serving patterns and prompt evaluation across MLflow, W&B, SageMaker, Vertex AI.
Dispatches MLflow tasks to the appropriate sub-skill for tracing, evaluation, debugging, and onboarding. Use when the user needs MLflow help but hasn't specified a sub-skill.