From signalpilot-dbt
Load FIRST before any dbt project work. Covers the full 5-step dbt workflow: project scanning, mapping, validation, contract understanding, SQL writing, and verification. Also covers output shape inference, incremental model handling, and what to trust in YML.
How this skill is triggered — by the user, by Claude, or both
Slash command
/signalpilot-dbt:dbt-workflowThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill orchestrates the complete dbt project workflow. Load it FIRST whenever
This skill orchestrates the complete dbt project workflow. Load it FIRST whenever working on a dbt project — it contains rules that affect how you interpret everything.
To scan a dbt project, run:
python3 "${CLAUDE_SKILL_DIR}/scan_project.py" "<project_directory>"
This returns: models to build, stubs to rewrite, dependencies, required columns, sources, macros, and current_date hazards. Run this FIRST in Step 1.
Call list_notion_integrations. If it returns integrations, Notion is available.
If none exist, skip this step — Notion is optional.
If Notion is available: Load the /signalpilot-dbt:notion-context skill and
run it. The skill will use the integration discovered here.
After gathering context:
notion_context.md in the working directory (the
notion-verify subagent reads this after the build).If Notion is not configured: Skip this step. Proceed to Step 1.
Run the project scan tool above with the dbt project directory, then call
mcp__signalpilot__dbt_project_map project_dir="<your_project_dir>".
The work order at the bottom is your plan. Use the project scan to identify:
Call mcp__signalpilot__dbt_project_validate project_dir="<your_project_dir>".
Fix any parse errors before writing SQL.
For each model in the work order:
dbt_project_map with focus="model:<name>" for the column contractreference_snapshot.md exists, check it for the pre-existing row count and
sample data. If present, that row count is your target.SELECT COUNT(DISTINCT <grain_key>) FROM <source> as an UPPER BOUND.Load the /signalpilot-dbt:dbt-write skill + the SQL skill for your database
(e.g. /signalpilot-dbt:duckdb-sql for DuckDB). Write SQL for EVERY model in
the work order. For each model (in dependency order):
dbt run --select <model> to build itAfter all stubs are written, rebuild them AND their downstream dependents:
dbt run --select <stub1>+ <stub2>+ (the + suffix includes downstream
models that depend on the stubs you wrote).
If errors, load /signalpilot-dbt:dbt-debugging skill and fix. Do NOT run a bare
dbt run — it rebuilds ALL models including pre-existing ones you didn't touch,
which can change surrogate key assignments and break FK relationships.
After your final dbt run completes, verify all models:
query_database with SELECT 1subagent_type="verifier" to check all models you builtAfter the verifier completes, check if notion_context.md exists in the working
directory. If it does, use the Agent tool with subagent_type="notion-verify" to
write a traceability report to Notion documenting which context items influenced
the build and how.
If notion_context.md does not exist, skip this step.
completes if applicable (Step 6).
Extract from description: field:
ROW_NUMBER() ... <= N using a deterministic tiebreaker (add primary key to
ORDER BY). Do NOT use DENSE_RANK for filtering — it can return more than N rows.WHERE date_col = (SELECT MAX(date_col) FROM source).CAST(NULL AS DOUBLE) — see rule below.How to read YML descriptions: Descriptions tell you what the data MEANS, not what code to write. Use them to:
But do NOT treat descriptions as literal computation instructions. After reading the description, always verify your logic against the actual source data.
Write at top of SQL: -- EXPECTED SHAPE: <row count or formula> — REASON: <quote>
When a dbt project uses materialized="incremental" models, the project is
designed to accumulate state over multiple runs. On a first run (full refresh,
no prior state), incremental models build from scratch.
If you are writing a new model that includes period-over-period metrics (MoM, WoW, YoY) and the project has not been run incrementally before:
WHERE date_col = (SELECT MAX(date_col) FROM source)CAST(NULL AS DOUBLE) — there is no prior
aggregated state to compare against.Trust YML for: column names (exact match required), column descriptions (what each column represents), ref dependencies (what tables to join).
YML not_null tests on key/dimension columns (IDs, names, dates, categories)
imply a WHERE col IS NOT NULL filter on input data. not_null on metric/aggregate
columns just asserts the output shouldn't be NULL — don't filter inputs for those.
Do NOT trust YML for: grain/row count. YML unique and not_null tests are
assertions that may be aspirational or wrong.
Derive the grain from these signals (in priority order):
Do NOT deduplicate with ROW_NUMBER to force a unique test to pass.
Run: python3 "${CLAUDE_SKILL_DIR}/validate_project.py" "<project_dir>"
This runs dbt parse and produces a structured report of errors, warnings,
and orphan patches (yml-defined models with no .sql file). Use this as the
Step 2 validation when the MCP server is not available. Accepts an optional
second argument for timeout in seconds (default 60).
dbt commands with run_in_background or & — dbt holds a database
write lock while running.yml files unless fixing a missing schema: in a source definitionSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub signalpilot-labs/signalpilot-plugin --plugin signalpilot-dbt