Skill

tracking-model-versions

Tracks AI/ML model versions using MLflow: logs hyperparameters/metrics, registers models, manages Staging/Production stages, compares performance, generates model cards.

Python

ai-ml

Popularity

Parent stars

2,199

Parent forks

296

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/model-versioning-tracker:tracking-model-versions

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadWriteEditGrepGlobBash(cmd:*)

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Track and manage AI/ML model versions using MLflow, DVC, or Weights & Biases. Log model metadata (hyperparameters, training data hash, framework version), record evaluation metrics (accuracy, F1, latency), manage model registry transitions (Staging, Production, Archived), and generate model cards documenting lineage and performance.

Supporting Files

assets/README.mdassets/example_mlflow_workflow.yamlassets/model_card_template.mdassets/versioning_diagram.pngreferences/README.mdscripts/README.mdscripts/version_control.sh

SKILL.md

69 lines · ~1.3k tokens

Stats

LanguagePython

Parent stars2,199

Parent forks296

MaintenanceExcellent

Last CommitApr 3, 2026

Actions

View Source View Plugin View on GitHub View README

Model Versioning Tracker

Overview

Prerequisites

MLflow tracking server running locally or remotely (mlflow server or managed MLflow)
Python 3.9+ with mlflow, pandas, and the relevant ML framework installed
Model artifacts accessible on the local filesystem or cloud storage (S3, GCS)
Write access to the MLflow tracking URI and artifact store

Instructions

Connect to the MLflow tracking server by setting MLFLOW_TRACKING_URI and verify connectivity with mlflow experiments list.
Create or select an MLflow experiment for the model project using mlflow experiments create --experiment-name <name>.
Log a new model version: start an MLflow run, log parameters (learning rate, epochs, batch size), log metrics (accuracy, loss, F1 score), and log the model artifact with mlflow.<flavor>.log_model().
Register the model in the MLflow Model Registry using mlflow.register_model() with the run URI and a descriptive model name.
Transition the model version through stages: None -> Staging -> Production using client.transition_model_version_stage(). Archive previous production versions.
Compare model versions by querying metrics across runs with mlflow.search_runs() and generating comparison tables showing metric improvements between versions.
Generate a model card from the registered model metadata, including training data description, evaluation metrics, intended use, limitations, and ethical considerations. See ${CLAUDE_SKILL_DIR}/assets/model_card_template.md.
Set up automated alerts for model performance degradation by comparing production metrics against baseline thresholds stored in the model registry.

See ${CLAUDE_SKILL_DIR}/assets/example_mlflow_workflow.yaml for a complete workflow configuration.

Examples

Tracking a new image classification model version: Log a ResNet-50 fine-tuned on a custom dataset. Record hyperparameters (lr=0.001, epochs=50, optimizer=Adam), metrics (val_accuracy=0.94, val_loss=0.18, inference_latency_ms=12), and the serialized model artifact. Register as version 3 in the model registry and transition to Staging for validation.

Comparing model versions before production promotion: Query MLflow for all versions of the sentiment-analysis model. Generate a comparison table showing accuracy improved from 0.87 (v2) to 0.91 (v3) while inference latency increased from 8ms to 15ms. Recommend promoting v3 to Production only if latency is acceptable for the use case.

Generating a model card for compliance review: Extract metadata from MLflow model registry version 5: training dataset (100K customer reviews), evaluation results (F1=0.89 on held-out test set), known limitations (struggles with sarcasm and multilingual input), and intended use (customer feedback classification). Output a structured Markdown model card.

Output

MLflow run with logged parameters, metrics, and model artifact
Model registry entry with version number and stage assignment
Version comparison table with metric deltas across runs
Model card in Markdown format documenting lineage, performance, and limitations

Error Handling

Error	Cause	Solution
MLflow connection refused	Tracking server not running or wrong URI	Verify `MLFLOW_TRACKING_URI` is correct; start server with `mlflow server --host 0.0.0.0 --port 5000`
Artifact upload failed	Insufficient permissions on artifact store	Check S3/GCS bucket permissions; verify IAM role has write access to the artifact path
Model registration conflict	Model name already exists with incompatible schema	Use a versioned model name or delete the conflicting registry entry
Metrics not logged	MLflow run ended before logging completed	Ensure all `log_metric()` calls happen within the active run context (`with mlflow.start_run():`)
Stage transition denied	Model version already in target stage	Archive the existing version in that stage first, then retry the transition

Resources

MLflow documentation: https://mlflow.org/docs/latest/index.html
MLflow Model Registry: https://mlflow.org/docs/latest/model-registry.html
DVC (Data Version Control): https://dvc.org/doc
Weights & Biases Model Registry: https://docs.wandb.ai/guides/model-registry
ML Model Cards: https://modelcards.withgoogle.com/about

tracking-model-versions

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

tracking-model-versions

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Model Versioning Tracker

Overview

Prerequisites

Instructions

Examples

Output

Error Handling

Resources

Similar Skills

Model Versioning Tracker

Overview

Prerequisites

Instructions

Examples

Output

Error Handling

Resources

Similar Skills