mlops-blueprints
Use this skill to produce concrete, right-sized MLOps blueprints: reference
architectures and operational patterns for ML pipelines, deployment, monitoring,
and governance. Output a decision-driven plan, not a generic checklist.
When to use
- The user is designing or hardening an ML system (training pipeline, model serving, monitoring, retraining).
- The user asks "how should we deploy/operate/monitor this model" or wants an MLOps reference architecture.
- The user needs governance, reproducibility, or cost/reliability patterns for ML in production.
Non-goals
- Do not prescribe a heavy platform when a simple pipeline + registry fits the team's scale.
- Do not invent infrastructure the user did not ask for; recommend the minimum that meets the requirements.
- Do not produce model/business logic — focus on the operational scaffolding around it.
How to work
- Establish context first: problem type (batch vs online, real-time latency budget), data volume/velocity, team size, existing cloud/stack, and constraints (compliance, cost, on-prem).
- Pick the smallest blueprint that satisfies the constraints; call out what you intentionally left out and when to add it.
- Deliver an objective, numbered plan with concrete tool choices (and at least one alternative), plus the trade-offs.
Blueprint areas (cover the ones that apply)
1) Data & feature pipelines
- Ingestion + validation (schema/dist checks, e.g. Great Expectations / pandera); fail fast on bad data.
- Reproducible transforms; versioned datasets (DVC / lakehouse table versions); a feature store only if features are shared across models/online+offline.
- Orchestration (Airflow / Prefect / Dagster / managed) with idempotent, retryable steps.
2) Training & experiment tracking
- Track params, metrics, code/data versions, and artifacts (MLflow / Weights & Biases).
- Make training reproducible: pinned deps (uv), seeds, captured environment; containerized jobs.
- Define promotion criteria (eval thresholds) before training, not after.
3) Model packaging & registry
- Version models in a registry with stage transitions (staging → production) and lineage back to data/code.
- Standard artifact format (e.g. ONNX / framework-native + signature); record input/output schema.
4) Deployment & serving
- Match the pattern to latency/throughput: batch scoring, online endpoint (FastAPI/BentoML/KServe/managed), or streaming.
- Progressive delivery: shadow / canary / blue-green; keep rollback trivial.
- Decouple model artifacts from app code so models update without full redeploys.
5) Monitoring & observability
- Operational: latency (p50/p95), error rate, throughput, saturation, cost.
- Data/model: input drift, prediction drift, data-quality breaches, and (when labels arrive) live quality metrics.
- Alerting with actionable thresholds; a retraining trigger (scheduled and/or drift-based) with a human gate.
6) Governance, security & cost
- Reproducibility & audit: lineage from prediction → model version → data → code.
- Access control for data/models/secrets; PII handling; model/data cards for documentation.
- Cost controls: right-sized compute, autoscaling, spot/batch where safe; track $ per prediction.
Deliverables checklist
- Context summary: problem type, latency/scale, constraints, existing stack.
- A numbered reference architecture (diagram-in-text is fine) with concrete tool choices + one alternative each.
- Per-area decisions for the areas that apply, with trade-offs and what was deliberately deferred.
- A rollout/retraining + monitoring plan (what is watched, thresholds, who is alerted, how to roll back).
- A short list of first concrete steps to implement the blueprint.