Given an Entropy Data data product URL or id, fetch its ODCS, generate Snowflake dbt models, run dbt-ol (ship lineage to Entropy Data on the spot), run dbt tests, and run datacontract tests — end-to-end in one go. Demo-grade. Trigger when the user asks to "implement the data product <url-or-id> [from its data contract]", "build a data product that implements its data contract", "build the dbt pipeline for this data product", "scaffold dbt models from a data contract", or any close variant referring to implementing, building, or materializing an existing published data product against its contract.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dataproduct-builder-demo:dataproduct-implementThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Turn an Entropy Data data product into a working Snowflake dbt pipeline and prove it works — in one pass. The data contract (ODCS) is the source of truth for the output schema; this skill reads it, writes dbt artifacts that produce data matching the contract, runs everything against your Snowflake target, and ships an OpenLineage event so the pipeline shows up in Entropy Data immediately.
Turn an Entropy Data data product into a working Snowflake dbt pipeline and prove it works — in one pass. The data contract (ODCS) is the source of truth for the output schema; this skill reads it, writes dbt artifacts that produce data matching the contract, runs everything against your Snowflake target, and ships an OpenLineage event so the pipeline shows up in Entropy Data immediately.
dataproduct-bootstrap first, then come back here.
${PLUGIN_ROOT}below refers to the root of this plugin — the directory that containsskills/. On Claude Code it is set automatically as${CLAUDE_PLUGIN_ROOT}; use that. On any other agent (Codex, Copilot CLI, etc.) it is unset; resolve it as../..relative to thisSKILL.mdfile's directory (i.e. the grandparent ofskills/<this-skill>/).
Read every contract-specified value from the contract; never hardcode a literal. Server name, schema, table, and types are read at run time from the data contract (e.g.
yq '.servers[0].server' <contract-file>) — baking a fixed value into a command is a bug, even if it happens to match today. The data product id, when not supplied by the user, is theidin the local*.odps.yaml.
Before running Step 0, print this plan to the user verbatim:
Running dataproduct-implement. I'll:
- Pre-checks: confirm this is a dbt project;
dbt,dbt-ol,datacontract,entropy-data,jq, andyqare on PATH; the Entropy Data API key is available fromentropy-data connection; Snowflake credentials are readable from~/.dbt/profiles.yml.- Resolve the data product by id (
entropy-data dataproducts get <id>).- Fetch each output port's data contract (
entropy-data datacontracts get) and save it tomodels/output_ports/v<N>/. Remote contract is the source of truth — local file is always overwritten.- Validate the contract against the target platform's conventions (e.g. UPPERCASE identifiers on Snowflake). If fixable bugs are found, offer to patch and publish the corrected contract back to Entropy Data.
- Translate the ODCS schema into dbt models: append missing column projections to
models/output_ports/v<N>/<table>.sql, missing column entries tomodels/output_ports/v<N>/_models.yml. Existing SQL and tests are preserved byte-identical.- Wire input ports from active access agreements, write sources, project columns 1:1, leave the rest as TODOs.
- Run
dbt parseto catch syntax errors.- Run
dbt-ol runagainst the user's Snowflake target — this builds the tables AND ships the lineage event to Entropy Data on the spot.- Run
dbt testagainst the same target.- Run
datacontract testagainst each output-port contract.- Stamp the data product on Entropy Data with the
dataProductBuildercustomProperty.- Trigger a Snowflake re-ingest so the platform's asset inventory picks up the new tables (
entropy-data integrations run).- Summarize what was generated, what ran, and what's still TODO.
Then proceed.
Confirm dbt_project.yml exists at the working directory root. If not, route the user to dataproduct-bootstrap and stop.
Sync the project's venv with the full toolchain. uv sync is idempotent and pulls everything listed under pyproject.toml's [dependency-groups].dev (dbt-core, dbt-snowflake, openlineage-dbt, datacontract-cli[snowflake], entropy-data) at the versions pinned in uv.lock:
uv sync
All subsequent CLI invocations in this skill use uv run <cli> — that resolves to the venv's pinned binary without needing to activate. dbt, dbt-ol, and datacontract share one Python env this way, so the Snowflake adapter is visible to all three. Do not propose uv tool install here — per-project venv is the convention.
Confirm uv run --quiet entropy-data connection test succeeds. Otherwise stop and tell the user to run uv run entropy-data connection add <name> --host <host> --api-key <key>.
Confirm jq and yq are on PATH (system binaries, not pip-installed). Otherwise stop with brew install jq yq (macOS) or the apt equivalent.
Assume the user has a working Snowflake dbt profile. Don't audit ~/.dbt/profiles.yml; if it's misconfigured, uv run dbt parse / uv run dbt-ol run will surface a clear error.
Resolve DATA_PRODUCT_ID: if the user gave a full URL (https://app.entropy-data.com/<org>/dataproducts/<id>) take the trailing id; if they gave a bare id use it; otherwise read .id from the single *.odps.yaml in the working directory (yq '.id' *.odps.yaml). If none of these yields an id, ask the user.
Load the data product ODPS via the entropy-data CLI: run entropy-data dataproducts get "$DATA_PRODUCT_ID" -o yaml. Remember the response as DATA_PRODUCT. Extract:
DATA_PRODUCT_ID, DATA_PRODUCT_NAME, owning team, purposeAlways use the entropy-data CLI for any connection to Entropy Data (data products, data contracts, access, publishing). Do not use the Entropy Data MCP server for these calls.
If the data product has more than one output port, ask the user which one(s) to implement. Default to all.
If the data product does not exist on Entropy Data, stop and ask the user whether to create it via dataproduct-bootstrap first. This demo skill does not create platform records itself.
For each selected output port: entropy-data datacontracts get <contract-id> -o yaml, written to models/output_ports/v<N>/<contract-id>.odcs.yaml (where <contract-id> is the contract's id field and <N> is the major version derived from CONTRACT.version, default v1). Co-located with the SQL it governs, matching dataproduct-builder-dbt's layout. Remember as CONTRACT.
Always overwrite the local file with the remote response. The remote contract is the spec; the local SQL is the implementation. A divergence is the whole point of the run — someone changed the contract and the implementation needs to catch up. (Step 3 appends column projections to the existing SQL without touching CTEs / joins / filters, so the implementation logic is safe.)
From CONTRACT you'll need schema (table + properties: logicalType, required, primaryKey, unique, description, classification) and servers (Snowflake server the contract test runs against).
Before generating dbt artifacts, scan the contract for bugs that would break tests downstream and offer to fix them in one pass — closing the drift loop the same run, with the patched contract published back to Entropy Data.
The validation is keyed off servers[].type. For each declared server type, apply that platform's conventions; skip schema-only contracts. Today only Snowflake is enforced (it's the only target this skill supports), but the structure is intentionally per-platform so other targets can be added without rewriting the rule.
Snowflake checks — for every property in every schema covered by a type: snowflake server. Server-level identifiers (database, schema, table names) are part of the demo's static setup — leave them as-is; this step only addresses property-level drift introduced by users editing the contract through the UI.
datacontract-cli (≥ 0.11.5) quotes the contract name verbatim. Any name containing a lowercase letter is queried as "<lowercase>" and fails to match the stored UPPERCASE column. Normalize the name to UPPERCASE.physicalName. A physicalName whose value equals the UPPERCASE form of name adds no information once name is normalized. Drop it.If nothing is flagged, continue silently to Step 3.
Otherwise, surface a single confirmation listing every fix (one bullet per property × issue), then ask:
Found N convention issue(s) for Snowflake on contract
<CONTRACT_ID>:
- property
<old-name>: renamename→<NEW-NAME>- property
<NEW-NAME>: drop redundantphysicalName: <value>Apply, save to
models/output_ports/v<N>/<contract-id>.odcs.yaml, and publish the corrected contract back to Entropy Data? [Y/n]
On Y:
yq -i works for surgical updates. One rename + one physicalName delete per flagged property. Keep version unchanged (this is a convention fix, not a schema change consumers need to see as a new version).entropy-data datacontracts put <CONTRACT_ID> --file models/output_ports/v<N>/<contract-id>.odcs.yaml. Surface any non-2xx response and stop.CONTRACT from the patched file. Continue to Step 3.On n: continue with a warning that Step 8's datacontract test will fail on the un-normalized properties. Don't ask again this run.
Output column identifier rule (applies to every property in this step and Step 4). For every contract property, use the property's name directly as the SQL alias and the _models.yml columns: - name: entry for this table. On Snowflake, contract property names are UPPERCASE by convention (e.g. SKU, ARTICLE_NAME) — Snowflake folds unquoted identifiers to uppercase and datacontract test's soda-core driver quotes the contract name verbatim when querying. Using uppercase in the contract makes the SQL alias, the materialized column, the dbt tests, and datacontract test's lookup all line up against the same Snowflake-stored identifier. Don't rely on physicalName to bridge a case mismatch — current datacontract-cli (≥ 0.11.5) quotes by name, not physicalName, so a contract with name: sku + physicalName: SKU fails on Snowflake with "Required Column Missing".
For each contract:
Decide the dbt-side table name. Default: the schema[0].name from the contract. Confirm with the user if it differs from the output port's server table.
Identify candidate input ports. Run entropy-data access list --consumer-dataproduct <DATA_PRODUCT_ID> -o json to list active access agreements. Each entry's provider.dataProductId / provider.outputPortId is an input port this product can read. Keep agreements with info.active: true; ignore pending / rejected. If models/input_ports/<provider-output-port-id>.source.yaml already exists for an agreement, treat it as authoritative.
Generate or update models/output_ports/v<N>/<table>.sql. The file may already exist with non-trivial business logic — CTEs, joins, window functions — never rewrite it. Only two edits are allowed:
select that projects each contract column as cast(... as <snowflake-type>) as <OUT_COL> (OUT_COL per the rule above); leave the from clause as a TODO citing the candidate input ports from Step 3.2.select block, append cast(... as <type>) as <OUT_COL> for every contract column not already projected, in contract order, fixing the trailing comma. Everything else (CTEs, joins, filters, existing projections) stays byte-identical.The file must start with:
{{ config(materialized='table', schema='op_<table>_v<N>') }}
-- Governed by <contract-id>.odcs.yaml (ODCS id: <CONTRACT_ID>)
schema='op_<table>_v<N>' is taken literally by the generate_schema_name macro in macros/get_custom_schema.sql (per dataproduct-builder-dbt's dataproduct-dbt convention op_<output-port-id>_v<N>; in this demo plugin the output port id equals the table name, matching the convention's worked example op_customer_activity_v1). The materialized schema is exactly op_<table>_v<N> (UPPERCASE in Snowflake) — matching the contract's servers[].schema and isolated from internal staging/intermediate models in internal_<dbt_project_name>. For <table> use the snake_case output-port table name; for <N> the major version.
Append (or create) models/output_ports/v<N>/_models.yml — one shared YAML file per version layer, matching dataproduct-builder-dbt's convention. New file → create with the structure below. Existing file → append a new models: entry for this <table>, plus columns: entries for any contract columns not already listed under it; leave existing entries alone.
version: 2
models:
- name: <table>
description: <from contract>
config:
meta:
data_contract:
id: <CONTRACT_ID>
file: models/output_ports/v<N>/<contract-id>.odcs.yaml
owner: <team>
materialized: table
contract:
enforced: true
columns:
- name: <col>
description: <from contract>
data_type: <UPPERCASE Snowflake type>
constraints:
- type: not_null # required: true
- type: unique # unique: true or primaryKey: true
ODCS → dbt: required: true → not_null constraint, unique: true or primaryKey: true → unique + not_null constraints, enum → accepted_values (under data_tests:, not constraints:).
Map ODCS logicalType to Snowflake types:
ODCS logicalType | Snowflake |
|---|---|
string/text | varchar |
integer/long | number |
decimal/numeric | number(38,9) |
boolean | boolean |
timestamp | timestamp_ntz |
date | date |
For each output port:
Declare each candidate input port as a dbt source — for every access agreement from Step 3.2:
Fetch the provider data product (entropy-data dataproducts get <provider-data-product-id> -o yaml) to resolve the server (database/schema/table) and linked contract id.
Fetch the upstream contract (entropy-data datacontracts get <provider-contract-id> -o yaml) and write it to models/input_ports/<provider-output-port-id>.odcs.yaml as a trust snapshot.
Write models/input_ports/<provider-output-port-id>.source.yaml:
version: 2
sources:
- name: <provider-data-product-id>_<provider-output-port-id>
database: <database>
schema: <schema>
config:
meta:
data_contract:
id: <provider-contract-id>
file: models/input_ports/<provider-output-port-id>.odcs.yaml
tables:
- name: <table>
description: <from contract>
columns:
- name: <col>
description: <from contract>
data_type: <snowflake type from the type map in Step 3>
One pair (*.odcs.yaml + *.source.yaml) per agreement. Surface diffs and ask before overwriting an existing file.
Match input columns to output columns, in this order. Stop at the first signal that yields exactly one candidate.
type: semantics entry in authoritativeDefinitions whose URL ends in the same path segment after normalization (lowercase, strip non-alphanumeric — so …/processedTimestamp matches …/processed_timestamp). Scheme, host, and org-id prefix differences don't disqualify._ and case boundaries; the shorter side's tokens are all contained in the longer side's. Covers patterns like <X>_NAME ⊃ <x>, <DOMAIN>_<X> ⊃ <x>, <X>_TIMESTAMP ⊃ timestamp. Generic single-token output names (id, name, type, value, code, key) need a second signal — require (1) or a description echo (the upstream column's description names the output concept) before treating as a hit.If exactly one upstream column matches, project cast(<input_col> as <snowflake_type>) as <OUT_COL> (OUT_COL per the output column identifier rule in Step 3). If multiple match, write cast(null as <type>) as <OUT_COL> -- TODO: candidates: <names> and list them so the user can pick. If none match, write cast(null as <type>) as <OUT_COL> -- TODO: source <description>.
Write the SQL body.
from with from {{ source('<provider-data-product-id>_<provider-output-port-id>', '<table>') }}.{{ source(...) }} and the join keys.null as <col> with a -- TODO: compute <description> comment.dbt parseRun dbt parse. If it fails, surface the error, fix obvious mistakes (wrong source name, typos in _models.yml), and re-run. Do not proceed to Step 6 with a failing parse.
dbt-ol run (this is where lineage gets shipped)Confirm with the user: "Run dbt-ol run against your Snowflake target now? This materializes the models in Snowflake and ships the lineage event to Entropy Data immediately." Wait for explicit yes.
If the user has TODOs left in any output-port model (unwired from, derived columns, multi-source joins), warn them that the run will fail those models. Offer to scope to only the models with no TODOs: dbt-ol run --select <wired-model-1> <wired-model-2>.
Run with both OpenLineage env vars derived inline from the active entropy-data connection (target inferred from dbt_project.yml's profile: — usually dev locally):
OPENLINEAGE__TRANSPORT__URL=$(entropy-data connection get -o json | jq -r .host) \
OPENLINEAGE__TRANSPORT__AUTH__APIKEY=$(entropy-data connection get -o json | jq -r .api_key) \
dbt-ol run --target <target>
Both env vars must be set on the same command. The committed openlineage.yml intentionally omits url:, so a run with only __APIKEY fails with RuntimeError: 'url' key not passed to HttpConfig before dbt even starts. The fix is to add __URL back to the same invocation, not to retry with just __URL set.
Capture stdout and exit code. Non-zero means at least one model failed; surface the dbt log section, do not retry silently.
If dbt-ol run succeeded, the data product is now visible with materialized tables AND a lineage event in Entropy Data. Tell the user this explicitly in the final report — it is the whole point of the demo.
dbt testdbt test --target <target>
Captures the contract-derived tests (not_null, unique, accepted_values) added in Step 3. Surface failures by model and test name.
datacontract testFor each output-port contract, derive the Snowflake credentials from the dbt profile inline — don't require the user to set DATACONTRACT_SNOWFLAKE_* in their shell:
CONTRACT_FILE=models/output_ports/v<N>/<contract-id>.odcs.yaml
SERVER=$(yq '.servers[0].server' "$CONTRACT_FILE") # from the contract — never a hardcoded literal
PROFILE=$(yq '.profile' dbt_project.yml)
TARGET=$(yq ".${PROFILE}.target" ~/.dbt/profiles.yml) # or the --target passed earlier
DATACONTRACT_SNOWFLAKE_USERNAME=$(yq ".${PROFILE}.outputs.${TARGET}.user" ~/.dbt/profiles.yml) \
DATACONTRACT_SNOWFLAKE_PASSWORD=$(yq ".${PROFILE}.outputs.${TARGET}.password" ~/.dbt/profiles.yml) \
DATACONTRACT_SNOWFLAKE_ROLE=$(yq ".${PROFILE}.outputs.${TARGET}.role" ~/.dbt/profiles.yml) \
DATACONTRACT_SNOWFLAKE_WAREHOUSE=$(yq ".${PROFILE}.outputs.${TARGET}.warehouse" ~/.dbt/profiles.yml) \
datacontract test "$CONTRACT_FILE" --server "$SERVER" --logs
--logs ensures the failure detail (field + rule) is in stdout. Non-zero exit means at least one rule failed. Capture per-contract result for the final report.
Check DATA_PRODUCT.customProperties for an entry property: "dataProductBuilder" with value: "https://github.com/entropy-data/dataproduct-builder-demo". If already present, skip.
Otherwise:
entropy-data dataproducts get <DATA_PRODUCT_ID> -o yaml > /tmp/<DATA_PRODUCT_ID>.odps.yaml
Append to the top-level customProperties:
customProperties:
- property: "dataProductBuilder"
value: "https://github.com/entropy-data/dataproduct-builder-demo"
entropy-data dataproducts put <DATA_PRODUCT_ID> --file /tmp/<DATA_PRODUCT_ID>.odps.yaml
Delete the temp file.
The Entropy Data platform sees a Snowflake table only after its Snowflake integration ingests the schema. Until the next ingestion, the table dbt just materialized doesn't show up under the data product, and the schema-drift warning on the data contract page won't clear. Trigger a manual run so the inventory catches up.
The integration to trigger is the one that scans the output port's database. Find it:
entropy-data integrations list --source snowflake -o json \
| jq '.[] | {ingestionId, externalId, name}'
If exactly one Snowflake integration is configured (the common case on demo orgs), grab its externalId and trigger it:
entropy-data integrations run <externalId>
The call returns immediately with a scheduledAt timestamp (202). The ingestion runs in the background — typically a few minutes on a small Snowflake account, longer on larger ones. Don't pass --wait from this skill; the ingest is not on the critical path for the demo, and waiting would block the final report.
If the listing returns multiple Snowflake integrations, match the one whose assetOwnerTeamExternalId equals the data product's team external id; if still ambiguous, ask the user to pick and re-run with that externalId. If none are configured, skip this step entirely and call it out in the final report — the data product still works; the asset inventory just lags until the next scheduled run.
If the call returns 409 (already_running), don't try to cancel — note it in the final report ("re-ingest already in flight") and continue.
End with this two-part recap. Use the Status enum: created, updated, already present, passed, failed, skipped.
Part 1 — outcome table.
| Artifact | Status | Details |
|---|---|---|
| Data product | already present | <DATA_PRODUCT_ID> |
dataProductBuilder customProperty | … | "added" / "already present" |
Output-port data contract <CONTRACT_ID> | … | models/output_ports/v<N>/<contract-id>.odcs.yaml |
| Contract validation | … | "passed" / "normalized & republished: <property> × N" / "issues found, user declined fix" |
| Input-port contracts | … | <N> files at models/input_ports/<...>.odcs.yaml |
| Input-port sources | … | <N> files at models/input_ports/<...>.source.yaml |
Model <table>.sql | … | models/output_ports/v<N>/<table>.sql — "wired to <source>" / "join TODO" / "skipped per user" |
_models.yml columns added | … | counts (only columns newly appended from the contract to models/output_ports/v<N>/_models.yml; existing columns untouched) |
dbt parse | … | passed / failed: <reason> |
dbt-ol run | … | "passed — N models materialized, lineage shipped to <API_HOST>" / "failed" / "skipped" |
dbt test | … | "passed — N tests" / "failed: N of M" / "skipped" |
datacontract test | … | per contract: "passed" / "failed: " / "skipped" |
| Snowflake re-ingest | … | "triggered: <integration-externalId>" / "already running" / "skipped: no Snowflake integration" |
Part 2 — next steps. Bullet list, only what applies:
failed row, the concrete next action (which model, which test, which contract rule).dbt-ol run succeeded, link the user to <API_HOST>/<ORG_SLUG>/dataproducts/<DATA_PRODUCT_ID> so they can see the lineage event land. Derive <ORG_SLUG> from the active connection: entropy-data connection get -o json | jq -r .vanity_url. The <ORG_SLUG> segment is required — without it the app routes to a 404.ENTROPY_DATA_API_KEY, DBT_SNOWFLAKE_*) so the CI run reproduces the local run.If everything passed and there are no TODOs, write: Pipeline implemented, materialized, tested, and lineage published. Nothing else to do.
npx claudepluginhub entropy-data/dataproduct-builder-demo --plugin dataproduct-builder-demoSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.