From data-architecture
Design batch and streaming data pipelines. Plan ingestion, transformation, quality checks, and failure recovery. Use when building ETL/ELT systems or data infrastructure.
How this skill is triggered — by the user, by Claude, or both
Slash command
/data-architecture:data-pipeline-designThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design robust, maintainable data pipelines that reliably move, transform, and validate data at scale.
Design robust, maintainable data pipelines that reliably move, transform, and validate data at scale.
You are designing data pipelines (batch or streaming). Plan data flow, transformations, quality gates, failure recovery, and monitoring. Read source systems, target requirements, latency expectations, and volume projections.
Based on modern data engineering practices (Spark, Airflow, Kafka, Beam):
Choose Processing Model: Batch (daily jobs?) or streaming (realtime features?)? Hybrid (Lambda: batch + streaming for both speed and accuracy)? Consider latency SLA and cost.
Design Data Stages: Raw ingestion (as-is from source) → Bronze. Cleansing and normalization → Silver. Business logic and enrichment → Gold. This layered medallion architecture separates concerns.
Implement Quality Gates: Validation at each stage. Fail pipeline if data quality drops. Track anomalies: unexpected null rates, value distributions, cardinality changes.
Handle Failures and Recovery: Idempotent transformations allow safe retries. Checkpoint state for streaming pipelines; resume from last checkpoint on failure. Use dead-letter queues for unparseable records.
Plan Monitoring and Alerting: Track freshness (when was last successful run?), latency (time from source to sink), volume (record counts by stage), error rates. Alert on anomalies and SLA misses.
npx claudepluginhub sethdford/claude-skills --plugin architect-data-architectureDesigns data pipelines using functional principles: idempotency, immutability, declarative transformations. Guides on ELT, partitioning, dbt layers, data quality tests, and DAG orchestration.
Designs scalable data pipelines for batch and streaming processing. Covers ETL/ELT, Lambda, Kappa, Lakehouse architectures, orchestration (Airflow/Prefect), dbt transformations, and data quality frameworks.
Designs data pipelines and ETL processes covering extraction, transformation, loading, data quality checks, orchestration, and patterns for batch, streaming, CDC, ELT. Useful for building pipelines, data flows, syncing, or moving data between systems.