mlops-blueprints | Fullstack ML/AI Agent Skills

Stats

Actions

Tags

mlops-blueprints | Fullstack ML/AI Agent Skills

mlops-blueprints

Use this skill to produce concrete, right-sized MLOps blueprints: reference architectures and operational patterns for ML pipelines, deployment, monitoring, and governance. Output a decision-driven plan, not a generic checklist.

When to use

The user is designing or hardening an ML system (training pipeline, model serving, monitoring, retraining).
The user asks "how should we deploy/operate/monitor this model" or wants an MLOps reference architecture.
The user needs governance, reproducibility, or cost/reliability patterns for ML in production.

Non-goals

Do not prescribe a heavy platform when a simple pipeline + registry fits the team's scale.
Do not invent infrastructure the user did not ask for; recommend the minimum that meets the requirements.
Do not produce model/business logic — focus on the operational scaffolding around it.

How to work

Establish context first: problem type (batch vs online, real-time latency budget), data volume/velocity, team size, existing cloud/stack, and constraints (compliance, cost, on-prem).
Pick the smallest blueprint that satisfies the constraints; call out what you intentionally left out and when to add it.
Deliver an objective, numbered plan with concrete tool choices (and at least one alternative), plus the trade-offs.

Blueprint areas (cover the ones that apply)

1) Data & feature pipelines

Ingestion + validation (schema/dist checks, e.g. Great Expectations / pandera); fail fast on bad data.
Reproducible transforms; versioned datasets (DVC / lakehouse table versions); a feature store only if features are shared across models/online+offline.
Orchestration (Airflow / Prefect / Dagster / managed) with idempotent, retryable steps.

2) Training & experiment tracking

Track params, metrics, code/data versions, and artifacts (MLflow / Weights & Biases).
Make training reproducible: pinned deps (uv), seeds, captured environment; containerized jobs.
Define promotion criteria (eval thresholds) before training, not after.

3) Model packaging & registry

Version models in a registry with stage transitions (staging → production) and lineage back to data/code.
Standard artifact format (e.g. ONNX / framework-native + signature); record input/output schema.

4) Deployment & serving

Match the pattern to latency/throughput: batch scoring, online endpoint (FastAPI/BentoML/KServe/managed), or streaming.
Progressive delivery: shadow / canary / blue-green; keep rollback trivial.
Decouple model artifacts from app code so models update without full redeploys.

5) Monitoring & observability

Operational: latency (p50/p95), error rate, throughput, saturation, cost.
Data/model: input drift, prediction drift, data-quality breaches, and (when labels arrive) live quality metrics.
Alerting with actionable thresholds; a retraining trigger (scheduled and/or drift-based) with a human gate.

6) Governance, security & cost

Reproducibility & audit: lineage from prediction → model version → data → code.
Access control for data/models/secrets; PII handling; model/data cards for documentation.
Cost controls: right-sized compute, autoscaling, spot/batch where safe; track $ per prediction.

Deliverables checklist

Context summary: problem type, latency/scale, constraints, existing stack.
A numbered reference architecture (diagram-in-text is fine) with concrete tool choices + one alternative each.
Per-area decisions for the areas that apply, with trade-offs and what was deliberately deferred.
A rollout/retraining + monitoring plan (what is watched, thresholds, who is alerted, how to roll back).
A short list of first concrete steps to implement the blueprint.