From dak
Recommends and guides GCP data pipeline tools — dbt, Dataflow, Dataform, Dataproc Spark, BigQuery DTS, Cloud Composer — based on workspace files or user requirements.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dak:gcp-data-pipelinesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Expert guidance for navigating and building **data pipelines on Google Cloud
Expert guidance for navigating and building data pipelines on Google Cloud Platform (GCP) using the right tool for the job.
Act as a GCP Data Solutions Architect.
You MUST scan the workspace for existing pipeline indicators before asking or recommending anything:
| Framework | Indicator File / Content |
|---|---|
| Dataflow | .java files containing import org.apache.beam, .py |
: : files containing import apache_beam : | |
| Dataform | workflow_settings.yaml or dataform.json |
| dbt | dbt_project.yml |
| Spark | .ipynb or .py files containing import pyspark |
| Airflow | .py |
| Provisioning | deployment.yaml |
| Orchestration | deployment.yaml or *-pipeline.yaml |
dbt_project.yml, workflow_settings.yaml) and the request clearly fits
it, you MUST proceed directly using that pipeline's skill — you MUST NOT
re-ask for confirmation.deployment.yaml or *-pipeline.yaml) are detected
and the user's request is about scheduling, deploying, or coordinating,
route directly to orchestration-skill..py), it may not be necessarily Spark; it can
be Airflow or something else. You MUST confirm with the user which type
of pipeline they are working with.If the user has not specified a tool, you MUST present the following GCP pipeline options with a brief summary to help them choose:
Data pipeline tools — pick one to build or transform data:
| Option | Best For | Skill |
|---|---|---|
| BigQuery DTS | Managed ingestion | bigquery-data-transfer-service |
| : : from datasources : : | ||
| dbt | SQL-first teams; | dbt-bigquery |
| : : modular models with : : | ||
| : : built-in tests & : : | ||
| : : docs; all transforms : : | ||
| : : run inside BigQuery : : | ||
| Dataflow | Streaming pipelines; | gcp-dataflow |
| : : Apache Beam; Unified : : | ||
| : : stream and batch : : | ||
| : : processing; : : | ||
| : : High-throughput : : | ||
| : : Pubsub integration; : : | ||
| : : ML Preprocessing and : : | ||
| : : Inference at scale; : : | ||
| : : Advanced : : | ||
| : : observability; : : | ||
| : : Serverless data : : | ||
| : : processing : : | ||
| Dataform | Google-native ELT; | dataform-bigquery |
| : : GCP Console : : | ||
| : : integration; SQLX/JS : : | ||
| : : for complex : : | ||
| : : dependency management : : | ||
| **Spark (Dataproc | Large-scale data; | gcp-spark |
| : Serverless)** : PySpark/Java/Scala; : : | ||
| : : ML preprocessing; : : | ||
| : : Iceberg/BigLake : : | ||
| Other | Data Fusion, or | — |
| : : generic Python — : : | ||
| : : proceed with general : : | ||
| : : GCP assistance : : |
Deployment & Orchestration — used to provision infrastructure and coordinate multiple pipelines already in the repo:
| Option | Best For | Skill |
|---|---|---|
| **Cloud | GCP Data Pipeline | gcp-pipeline-orchestration |
| : Composer** : Orchestration : : | ||
| : : deploy/schedule : : | ||
| : : existing : : | ||
| : : pipelines(dbt + : : | ||
| : : Spark, etc.). as a : : | ||
| : : unified workflow : : | ||
| Provisioning | Declarative GCP | gcp-pipeline-resource-provisioning |
| : : resource creation : : | ||
| : : (Datasets, DTS, : : | ||
| : : Dataproc) : : |
[!TIP]
If the user mentions scheduling, automating, cron, or coordinating existing scripts, queries, or notebooks — highlight Cloud Composer / Orchestration as the most likely fit.
[!NOTE]
Based on any hints in the user's request (data size, language preference, source/destination, complexity), you SHOULD briefly highlight the most likely fit before asking them to confirm.
[!IMPORTANT]
You MUST stop and wait for the user to select one of the options above. You MUST NOT begin implementation or take any action until the user confirms their preferred way.
If the user asks to "run the pipeline", you MUST clarify their intent using a two-step process:
Clarify Scope: First, if multiple pipelines or components are detected in the workspace (e.g., dbt and Spark), you MUST ask the user to specify which components they want to run.
Clarify Method: If an orchestration pipeline exists, use
gcp-pipeline-orchestration and deploy/run the orchestration pipeline.
Otherwise, you MUST ask the user how they want to run it:
dbt run, gcloud dataproc jobs submit,
dataform run etc.).@skill:gcp-pipeline-orchestration skill for more
context.Once the user confirms, activate the corresponding skill:
| Choice | Skill to Activate |
|---|---|
| BigQuery DTS | bigquery-data-transfer-service |
| dbt | dbt-bigquery |
| Dataflow | gcp-dataflow |
| Dataform | dataform-bigquery |
| Spark | gcp-spark |
| Provisioning | gcp-pipeline-resource-provisioning |
| Orchestration | gcp-pipeline-orchestration |
| Other | — (general GCP assistance) |
npx claudepluginhub gemini-cli-extensions/data-agent-kit-starter-pack --plugin dakDesign and troubleshoot GCP data pipelines including Dataflow (Apache Beam), Pub/Sub messaging, Dataproc (Spark/Hadoop), Cloud Composer (Airflow), and Dataplex governance.
Generates, updates, and deploys Google Cloud Composer orchestration pipelines for data pipelines including dbt, Spark, Dataform, notebooks, Python scripts, and BigQuery SQL. Creates deployment.yaml and orchestration YAML files.
Designs and implements scalable batch and streaming data pipelines, modern data warehouses, and lakehouse architectures using Spark, dbt, Airflow, and cloud-native platforms.