Skill

dbt-workflow

Load FIRST before any dbt project work. Covers the full 5-step dbt workflow: project scanning, mapping, validation, contract understanding, SQL writing, and verification. Also covers output shape inference, incremental model handling, and what to trust in YML.

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/signalpilot-dbt:dbt-workflow

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Bash(dbt *)Bash(python3 *)

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill orchestrates the complete dbt project workflow. Load it FIRST whenever

Supporting Files

scan_project.pyvalidate_project.py

SKILL.md

177 lines · ~2.1k tokens

Stats

LanguagePython

Stars1

Forks1

MaintenanceExcellent

Last CommitMay 27, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

dbt Workflow Skill — Full Project Lifecycle

Overview

This skill orchestrates the complete dbt project workflow. Load it FIRST whenever working on a dbt project — it contains rules that affect how you interpret everything.

Project Scan Tool

To scan a dbt project, run:

python3 "${CLAUDE_SKILL_DIR}/scan_project.py" "<project_directory>"

This returns: models to build, stubs to rewrite, dependencies, required columns, sources, macros, and current_date hazards. Run this FIRST in Step 1.

The Workflow

Step 0 — Notion Context (if configured)

Call list_notion_integrations. If it returns integrations, Notion is available. If none exist, skip this step — Notion is optional.

If Notion is available: Load the /signalpilot-dbt:notion-context skill and run it. The skill will use the integration discovered here.

After gathering context:

The skill writes notion_context.md in the working directory (the notion-verify subagent reads this after the build).
Keep the context in working memory — reference it when making decisions about grain, joins, filters, and column logic in later steps.
If Notion context conflicts with YML, prefer YML for column names but prefer Notion for business logic (grain, filter rules, metric definitions).

If Notion is not configured: Skip this step. Proceed to Step 1.

Step 1 — Map the project

Run the project scan tool above with the dbt project directory, then call mcp__signalpilot__dbt_project_map project_dir="<your_project_dir>". The work order at the bottom is your plan. Use the project scan to identify:

STUBS TO REWRITE (models with placeholder SQL)
MODELS TO BUILD (models defined in YML but missing SQL)
DEPENDENCIES (build order)
REQUIRED COLUMNS (exact match from YML contracts)

Step 2 — Validate

Call mcp__signalpilot__dbt_project_validate project_dir="<your_project_dir>". Fix any parse errors before writing SQL.

Step 3 — Understand contracts + read siblings

For each model in the work order:

Call dbt_project_map with focus="model:<name>" for the column contract
If reference_snapshot.md exists, check it for the pre-existing row count and sample data. If present, that row count is your target.
If no reference exists, estimate the expected row count by querying source data: SELECT COUNT(DISTINCT <grain_key>) FROM <source> as an UPPER BOUND.
Read the SQL of any complete sibling model in the same directory that shares column names with your model. You MUST read sibling SQL before writing — do not skip this step.

Step 4 — Write and Build ALL Models

Load the /signalpilot-dbt:dbt-write skill + the SQL skill for your database (e.g. /signalpilot-dbt:duckdb-sql for DuckDB). Write SQL for EVERY model in the work order. For each model (in dependency order):

Read the YML contract — column names must match EXACTLY
Write the SQL
Run dbt run --select <model> to build it

After all stubs are written, rebuild them AND their downstream dependents: dbt run --select <stub1>+ <stub2>+ (the + suffix includes downstream models that depend on the stubs you wrote).

If errors, load /signalpilot-dbt:dbt-debugging skill and fix. Do NOT run a bare dbt run — it rebuilds ALL models including pre-existing ones you didn't touch, which can change surrogate key assignments and break FK relationships.

Step 5 — Verify

After your final dbt run completes, verify all models:

Confirm the database is queryable: query_database with SELECT 1
Use the Agent tool with subagent_type="verifier" to check all models you built

Step 6 — Notion Verification Report (if configured)

After the verifier completes, check if notion_context.md exists in the working directory. If it does, use the Agent tool with subagent_type="notion-verify" to write a traceability report to Notion documenting which context items influenced the build and how.

If notion_context.md does not exist, skip this step.

STOP when: the verifier completes (Step 5), and the notion-verify subagent

completes if applicable (Step 6).

Output Shape — Read YML Description BEFORE Writing SQL

Extract from description: field:

ENTITY: "for each customer/driver/order" → one row per qualifying entity
QUALIFIER: "due to returned items" / "with at least one order" → filter or INNER JOIN
RANK CONSTRAINT: "top N" / "ranks the top N" → exactly N output rows. Filter with ROW_NUMBER() ... <= N using a deterministic tiebreaker (add primary key to ORDER BY). Do NOT use DENSE_RANK for filtering — it can return more than N rows.
TEMPORAL SCOPE: "rolling window", "MoM", "WoW", or "month-over-month" in the description → ONE output date (latest), not all historical dates. Filter with WHERE date_col = (SELECT MAX(date_col) FROM source).
PERIOD-OVER-PERIOD: If the description mentions MoM, WoW, YoY comparisons AND you are writing this model from scratch (stub/missing), the comparison column must be CAST(NULL AS DOUBLE) — see rule below.

How to read YML descriptions: Descriptions tell you what the data MEANS, not what code to write. Use them to:

Identify which source columns to use
Understand the business meaning of each column
Pick the right aggregation logic

But do NOT treat descriptions as literal computation instructions. After reading the description, always verify your logic against the actual source data.

Write at top of SQL: -- EXPECTED SHAPE: <row count or formula> — REASON: <quote>

Incremental Models and Period-Over-Period Columns

When a dbt project uses materialized="incremental" models, the project is designed to accumulate state over multiple runs. On a first run (full refresh, no prior state), incremental models build from scratch.

If you are writing a new model that includes period-over-period metrics (MoM, WoW, YoY) and the project has not been run incrementally before:

Output rows for the latest date only: WHERE date_col = (SELECT MAX(date_col) FROM source)
Period-over-period columns must be CAST(NULL AS DOUBLE) — there is no prior aggregated state to compare against.

What to Trust in YML

Trust YML for: column names (exact match required), column descriptions (what each column represents), ref dependencies (what tables to join).

YML not_null tests on key/dimension columns (IDs, names, dates, categories) imply a WHERE col IS NOT NULL filter on input data. not_null on metric/aggregate columns just asserts the output shouldn't be NULL — don't filter inputs for those.

Do NOT trust YML for: grain/row count. YML unique and not_null tests are assertions that may be aspirational or wrong.

Derive the grain from these signals (in priority order):

Unique key structure: If the YML defines a unique/surrogate key, examine its composition
Column list: The columns themselves reveal the grain
Upstream model grain: Check existing upstream models
Source cardinality: Query source tables to check expected row count
Sibling model row counts: Check complete models at the same level

Do NOT deduplicate with ROW_NUMBER to force a unique test to pass.

Validate Project (Standalone)

Run: python3 "${CLAUDE_SKILL_DIR}/validate_project.py" "<project_dir>"

This runs dbt parse and produces a structured report of errors, warnings, and orphan patches (yml-defined models with no .sql file). Use this as the Step 2 validation when the MCP server is not available. Accepts an optional second argument for timeout in seconds (default 60).

Rules

NEVER run dbt commands with run_in_background or & — dbt holds a database write lock while running
Do NOT modify .yml files unless fixing a missing schema: in a source definition
Do NOT guess column names — use the YML contract as source of truth
Do NOT install external packages — all dbt packages are pre-bundled in dbt_packages/

dbt-workflow

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

dbt-workflow

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

dbt Workflow Skill — Full Project Lifecycle

Overview

Project Scan Tool

The Workflow

Step 0 — Notion Context (if configured)

Step 1 — Map the project

Step 2 — Validate

Step 3 — Understand contracts + read siblings

Step 4 — Write and Build ALL Models

Step 5 — Verify

Step 6 — Notion Verification Report (if configured)

STOP when: the verifier completes (Step 5), and the notion-verify subagent

Output Shape — Read YML Description BEFORE Writing SQL

Incremental Models and Period-Over-Period Columns

What to Trust in YML

Validate Project (Standalone)

Rules

Similar Skills

dbt Workflow Skill — Full Project Lifecycle

Overview

Project Scan Tool

The Workflow

Step 0 — Notion Context (if configured)

Step 1 — Map the project

Step 2 — Validate

Step 3 — Understand contracts + read siblings

Step 4 — Write and Build ALL Models

Step 5 — Verify

Step 6 — Notion Verification Report (if configured)

STOP when: the verifier completes (Step 5), and the notion-verify subagent

Output Shape — Read YML Description BEFORE Writing SQL

Incremental Models and Period-Over-Period Columns

What to Trust in YML

Validate Project (Standalone)

Rules

Similar Skills