From vibedata-dbt-skills
Implement SCD Type 2 snapshots in dbt on Microsoft Fabric. Use when building snapshot models, choosing between timestamp and check strategies, or handling Fabric-specific datetime2 quirks. Also use when tracking slowly changing dimensions or debugging snapshot drift.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vibedata-dbt-skills:dbt-snapshot-scd2The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Hard-to-find patterns for implementing dbt snapshots on Fabric. Focuses on what breaks in practice, Fabric-specific datetime2 behaviors, and decision frameworks that go beyond "it depends."
Hard-to-find patterns for implementing dbt snapshots on Fabric. Focuses on what breaks in practice, Fabric-specific datetime2 behaviors, and decision frameworks that go beyond "it depends."
Do not default to "it depends." Use this decision tree:
timestamp when ALL of these are true:updated_at (or equivalent) columnupdated_at retroactivelycheck when ANY of these are true:updated_at column existsupdated_at only reflects metadata changes (e.g., profile edits but not address changes)check with check_cols: all on:When using check, always list explicit columns in check_cols rather than all.
Always snapshot source tables directly. Never snapshot staging views.
# CORRECT — snapshots the raw source
snapshots:
- name: snap_customers
relation: source('crm', 'customers')
config:
unique_key: customer_id
strategy: timestamp
updated_at: modified_date
# WRONG — staging views have no history
snapshots:
- name: snap_customers
relation: ref('stg_crm__customers') # This defeats the purpose
config:
unique_key: customer_id
strategy: timestamp
updated_at: modified_date
Why this matters:
ref()s the snapshot, not in the snapshot itselfThis is the single most common snapshot failure on Fabric that does not appear in official docs.
Fabric stores timestamps as datetime2 with configurable precision (0-7 fractional seconds). When updated_at has higher precision than what the snapshot comparison uses, rows appear "changed" on every run because the fractional seconds don't match after round-tripping through the snapshot table.
dbt_valid_to is set on rows that should still be currentdbt_valid_fromCast updated_at to a consistent precision in your snapshot query:
snapshots:
- name: snap_orders
relation: source('erp', 'orders')
config:
unique_key: order_id
strategy: timestamp
updated_at: updated_at_truncated
columns:
- name: updated_at_truncated
description: "Truncated to second precision to avoid datetime2 drift"
With a custom SQL block in the snapshot (legacy Jinja syntax, still supported):
{% snapshot snap_orders %}
{{ config(
target_schema='snapshots',
unique_key='order_id',
strategy='timestamp',
updated_at='updated_at_truncated'
) }}
select
*,
cast(updated_at as datetime2(0)) as updated_at_truncated
from {{ source('erp', 'orders') }}
{% endsnapshot %}
Key detail: datetime2(0) truncates to whole seconds. datetime2(3) keeps milliseconds. Choose the precision that matches your source system's actual granularity. If the source updates at second-level granularity, use datetime2(0).
Fabric datetime2 columns are timezone-naive. dbt snapshots use the database server's time for dbt_valid_from and dbt_valid_to. On Fabric, this is always UTC.
Implications:
America/New_York), updated_at and dbt_valid_from are in different timezonesdbt_valid_from/dbt_valid_to to local time in the martBy default, current records have dbt_valid_to = NULL. This is correct but makes queries harder:
-- Querying current state requires IS NULL check
select * from {{ ref('snap_customers') }}
where dbt_valid_to is null
Use dbt_valid_to_current to set a sentinel value instead:
snapshots:
- name: snap_customers
relation: source('crm', 'customers')
config:
unique_key: customer_id
strategy: timestamp
updated_at: modified_date
dbt_valid_to_current: "cast('9999-12-31' as datetime2(0))"
This makes range queries simpler and more performant:
-- With sentinel: simpler predicate, better query plan
select * from {{ ref('snap_customers') }}
where @as_of_date between dbt_valid_from and dbt_valid_to
Fabric-specific: The sentinel value must be a valid datetime2 expression. Use cast('9999-12-31' as datetime2(0)), not a raw string.
Use a composite unique_key when no single column uniquely identifies a row:
snapshots:
- name: snap_order_line_items
relation: source('erp', 'order_lines')
config:
unique_key:
- order_id
- line_item_id
strategy: timestamp
updated_at: modified_at
Missing a column: If the source has a 3-part natural key (order_id, line_item_id, version_id) but you only list two, the snapshot silently produces incorrect history — rows with different version_id values overwrite each other.
dbt_scd_id hash collision: The dbt_scd_id is a hash of the unique_key columns plus the updated_at (timestamp strategy) or check_cols values (check strategy). With composite keys, the hash inputs are concatenated. If column values can contain the delimiter character, collisions are possible.
Performance: On Fabric, the MERGE statement generated for composite keys includes an AND clause per key column. For tables >10M rows, this can be slow. Consider:
generate_surrogate_key() from tsql-utils to create a single synthetic keyNULL in key columns: If any unique_key column can be NULL, the MERGE ON clause fails silently (NULL = NULL is false in T-SQL). Fix by coalescing NULLs:
{% snapshot snap_with_nullable_key %}
{{ config(
target_schema='snapshots',
unique_key='composite_key',
strategy='timestamp',
updated_at='modified_at'
) }}
select
*,
concat(
cast(isnull(order_id, -1) as varchar(20)), '-',
cast(isnull(line_item_id, -1) as varchar(20))
) as composite_key
from {{ source('erp', 'order_lines') }}
{% endsnapshot %}
Scenario: A snapshot captured customer status = 'active' on Monday. On Wednesday, someone corrects the source to status = 'inactive' for Monday's data (a retroactive fix). The snapshot already has Monday's incorrect record.
dbt does NOT retroactively fix snapshot history. The snapshot will:
dbt_valid_to = next run time)dbt_valid_from = next run time)This means the snapshot history shows:
status = 'active' (incorrect, but this is what was observed)status = 'inactive' (the correction)The snapshot does NOT show what the source intended the data to be on Monday.
If your business requires "corrected" history, build a correction layer downstream:
-- dim_customers (downstream of snapshot)
-- Uses the LATEST known value for each validity period
select
customer_id,
status,
dbt_valid_from,
dbt_valid_to
from {{ ref('snap_customers') }}
-- Additional logic to detect and flag corrections
Configure hard_deletes to control what happens when a row disappears from the source:
| Option | Behavior | When to use |
|---|---|---|
ignore (default) | Deleted rows stay as current records forever | Source never deletes, or deletes are errors |
invalidate | Sets dbt_valid_to on the deleted row | Source uses soft-delete pattern but you want to reflect it |
new_record | Inserts a new row with dbt_is_deleted = 'True' | You need to explicitly track when something was deleted |
On Fabric, invalidate is the safest choice when source deletes are intentional — it preserves history without adding a new column.
Wrong: relation: ref('stg_crm__customers')
Right: relation: source('crm', 'customers')
Staging views have no history. The snapshot will capture identical data on every run.
If updated_at is NULL for some rows, the timestamp strategy skips those rows entirely — they are never snapshotted. Either:
check strategy insteadcast(isnull(updated_at, '1900-01-01') as datetime2(0)) as updated_at_safecheck_cols: all on a 50-column table means dbt compares all 50 columns for every row on every run. On tables >1M rows, this query can take 10+ minutes on Fabric. Always list only the columns you care about tracking.
dbt build runs everything in DAG order, including snapshots. Snapshots should typically run on their own cadence (matching source update frequency), not on every build. Use:
# Run snapshots separately
dbt snapshot --select snap_customers snap_orders
# Run models without snapshots
dbt build --exclude resource_type:snapshot
See the datetime2 precision section. This causes phantom changes and unbounded snapshot growth.
Snapshots and incremental models serve different purposes:
dbt_valid_from/dbt_valid_to.Don't use incremental models to "track changes" — use snapshots.
Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub accelerate-data/vibedata-official --plugin vibedata-dbt-skills