Skill

Qml-pipeline

Designs production-grade ML pipelines with experiment tracking (MLflow/W&B), orchestration DAGs (Kubeflow/Airflow), feature stores (Feast), model registries, and automated retraining.

Python

Airflow

data-engineering

ai-ml

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/qe-framework:Qml-pipeline

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Senior ML pipeline engineer specializing in production-grade machine learning infrastructure, orchestration systems, and automated training workflows.

Supporting Files

references/experiment-tracking.mdreferences/feature-engineering.mdreferences/model-validation.mdreferences/pipeline-orchestration.mdreferences/training-pipelines.md

SKILL.md

133 lines · ~1.5k tokens

Stats

LanguageJavaScript

Stars5

MaintenanceExcellent

Last CommitJun 15, 2026

Actions

View Source View Plugin View on GitHub View README

ML Pipeline Expert

Senior ML pipeline engineer specializing in production-grade machine learning infrastructure, orchestration systems, and automated training workflows.

Core Workflow

Design — Map data flow, identify stages, define component interfaces
Validate — Run schema & distribution checks before training; halt on failures
Feature — Build transformation pipelines, feature stores, validation checks
Orchestrate — Configure distributed training, hyperparameter tuning, resource allocation
Track — Log metrics, parameters, artifacts; enable comparison & reproducibility
Validate & Deploy — Implement evaluation gates; run A/B testing before promotion

Code Patterns (3 Examples with Docstrings)

# Pattern 1: Feature store integration
def build_feature_store(feast_repo_path: str, feature_list: list):
    """Initialize Feast feature store and load features for training."""
    from feast import FeatureStore
    fs = FeatureStore(repo_path=feast_repo_path)
    features = fs.get_historical_features(entity_df, features=feature_list)
    return features

# Pattern 2: MLflow experiment logging
def log_training_run(params: dict, metrics: dict, artifacts: list, run_name: str):
    """Log complete training run: params, metrics, model, plots."""
    import mlflow
    with mlflow.start_run(run_name=run_name):
        mlflow.log_params(params)
        mlflow.log_metrics(metrics)
        for artifact_path in artifacts:
            mlflow.log_artifact(artifact_path)
        return mlflow.active_run().info.run_id

# Pattern 3: Data validation checkpoint
def validate_pipeline_input(df, expected_schema: dict, min_rows: int = 100):
    """Validate data quality before pipeline execution."""
    assert df.shape[0] >= min_rows, f"Insufficient rows: {df.shape[0]} < {min_rows}"
    for col, dtype in expected_schema.items():
        assert col in df.columns, f"Missing column: {col}"
        assert str(df[col].dtype) == dtype, f"Type mismatch {col}: {df[col].dtype} != {dtype}"
    return df

Comment Template (Google-style)

def orchestrate_training_pipeline(config_path: str, experiment_name: str):
    """One-line orchestration strategy summary.
    
    Longer: feature engineering, parallelization, validation gates, registry.
    
    Args:
        config_path: Path to YAML pipeline configuration
        experiment_name: MLflow experiment identifier
    
    Returns:
        Registered model URI from registry
    
    Raises:
        FileNotFoundError: If config not found
        ValueError: If validation gates fail
    """

Lint Rules (ruff/mypy/black)

[tool.ruff]
line-length = 100
select = ["E", "F", "W", "UP"]

[tool.mypy]
python_version = "3.9"
disallow_untyped_defs = true
ignore_missing_imports = true

Security Checklist (5+)

Model poisoning — Validate data integrity; use DVC, checksums, distribution shift detection
Data privacy leakage — Never log raw data; use aggregates + differential privacy
Artifact signing — Sign model artifacts; enforce signature validation on load
Credential exposure — Use secrets manager & env vars; never hardcode keys in DAGs
Training-serving skew — Version feature definitions; validate stats match within tolerance

Anti-patterns (5 Wrong/Correct)

Anti-pattern	Fix
No experiment tracking; manual CSV logs	Use MLflow, W&B, Neptune for all runs; log params + metrics
Skipped validation; train on all data	Run schema checks, train/val split, log held-out test metrics
No versioning; "latest" model only	Use DVC for data, Git tags for code, model registry for artifacts
Different training & serving code paths	Single feature transform code; validate equivalence in tests
Single hyperparameter run; no tuning	Use Ray Tune, Optuna, or grid search; track all runs

MLflow Quick Start

import mlflow
import mlflow.sklearn

mlflow.set_experiment("my-experiment")
with mlflow.start_run():
    mlflow.log_params({"n_estimators": 100, "max_depth": 5})
    model.fit(X_train, y_train)
    mlflow.log_metric("accuracy", accuracy_score(y_test, preds))
    mlflow.sklearn.log_model(model, "model", registered_model_name="my-model")

MUST DO / MUST NOT DO

MUST: Version all data/code/models (DVC, Git, registry), pin seeds, validate data, log all params, track experiments, sign artifacts
MUST NOT: Train without tracking, skip validation, hardcode credentials, ignore train-serving skew, deploy without evaluation gates

Qml-pipeline

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Qml-pipeline

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

ML Pipeline Expert

Core Workflow

Code Patterns (3 Examples with Docstrings)

Comment Template (Google-style)

Lint Rules (ruff/mypy/black)

Security Checklist (5+)

Anti-patterns (5 Wrong/Correct)

MLflow Quick Start

MUST DO / MUST NOT DO

Similar Skills

ML Pipeline Expert

Core Workflow

Code Patterns (3 Examples with Docstrings)

Comment Template (Google-style)

Lint Rules (ruff/mypy/black)

Security Checklist (5+)

Anti-patterns (5 Wrong/Correct)

MLflow Quick Start

MUST DO / MUST NOT DO

Similar Skills