Skill

datacontract-test

Run the Data Contract CLI (`datacontract test`) against ODCS contracts in the project to verify the live data still conforms — schema, quality rules, and freshness. Handles two kinds of contracts with different semantics: output-port contracts under `src/output_ports/**/*.odcs.yaml` (tested against this project's Databricks warehouse — "am I still producing what I promised?") and input-port contracts under `src/input_ports/*.odcs.yaml` (tested against the upstream warehouse — "is upstream still producing what I trusted?"). Trigger when the user asks to "test the data contracts", "verify the data product matches its contract", "are we still contract-conformant", "check upstream drift", or "run the contract tests".

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/dataproduct-builder-databricks:datacontract-test

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Run the **Data Contract CLI** (`datacontract test`) against contracts in the project to check whether the data currently produced by a warehouse still matches the schema and quality rules declared in the contract.

SKILL.md

237 lines · ~3.5k tokens

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitMay 29, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Test ODCS data contracts against the live server

Run the Data Contract CLI (datacontract test) against contracts in the project to check whether the data currently produced by a warehouse still matches the schema and quality rules declared in the contract.

Two kinds of contracts live in this project and they test against different warehouses:

Output-port contracts at src/output_ports/v<N>/*.odcs.yaml — what this data product commits to produce. They test against this project's Databricks workspace. A failure means we are no longer producing what we promised.
Input-port contracts at src/input_ports/*.odcs.yaml — cached snapshots of what we trust upstream to produce. They test against the upstream provider's warehouse, using a server block from upstream's ODCS. A failure means upstream drifted from the contract we trusted; the consequence is that our output may break too. Treat input-port failures as an upstream incident, not a local bug.

When to use this vs. other skills

You changed a contract and want to know if the edit breaks consumers → use datacontract-edit (it edits, tests, and classifies the failure as breaking-or-not).
You want to verify existing contracts against current data, no edits → this skill.
A CI run failed the contract test step → this skill, to reproduce locally with --logs.

How to run this skill

Plan announcement (before Step 0)

Before running Step 0, print this plan to the user verbatim:

Running datacontract-test. I'll:

Pre-checks: confirm the datacontract CLI is on PATH and the server credentials are available.

Pick which contract(s) to test — defaults to all src/output_ports/**/*.odcs.yaml and src/input_ports/*.odcs.yaml.

Pick the server (defaults to production if the contract has one).

Run datacontract test per contract and capture the result.

Report pass/fail with per-rule detail; flag missing credentials separately from real failures.

Then proceed.

Step 0 — Pre-checks

Confirm uv run --quiet datacontract --version succeeds from the project root. If it fails, run uv sync (the init template seeds datacontract-cli[all] as a dev dep in pyproject.toml) and retry. If uv sync still doesn't make it available, stop and tell the user to verify datacontract-cli[all] is listed in pyproject.toml's [dependency-groups].dev. Do not propose uv tool install here — per-project venv is the convention.
Confirm at least one *.odcs.yaml exists under src/output_ports/**/ or src/input_ports/. If not, stop and tell the user there's nothing to test.
For each contract that will run, inspect its servers block and list the env vars the chosen server type needs (e.g. DATACONTRACT_DATABRICKS_TOKEN and DATACONTRACT_DATABRICKS_HTTP_PATH for Databricks, DATACONTRACT_SNOWFLAKE_USERNAME / ..._PASSWORD for Snowflake). If any are unset, surface the list to the user and ask whether to continue (the CLI will fail-fast on that server) or stop. Do not try to source credentials yourself.

Step 1 — Select contracts

If the user named a specific contract file or data product id, resolve it to one file. Search both src/output_ports/**/*.odcs.yaml and src/input_ports/*.odcs.yaml.
If the user said "output contracts" / "input contracts" / "upstream drift", scope to one of those globs.
If they didn't, default to all ODCS files under both globs. List them, grouped by Output ports and Input ports so the user sees the two roles, then ask before running.
Remember the resolved list as CONTRACTS. For each entry, also remember its role (output or input) — Step 4 surfaces failures differently.

Step 2 — Select the server

For each contract in CONTRACTS:

If the contract has exactly one server, use it.
If it has multiple, default to production. If production isn't defined, ask the user which one.
Only pass --server all if the user explicitly asks to test every server.

Step 3 — Run the test

For each contract:

uv run datacontract test <path-to-contract>.odcs.yaml --server <server> --logs

Where <path-to-contract> is the file resolved in Step 1 — typically src/output_ports/v<N>/<file>.odcs.yaml for output contracts, or src/input_ports/<file>.odcs.yaml for input contracts. The CLI does not care which directory; the role only matters for how Step 4 reports the result.

--logs ensures per-rule failure detail is in stdout — without it the CLI only prints a summary.
For every contract you intend to publish in Step 3b, write a JSON report too: add --output ./test-results/<contract>.json --output-format json. The entropy-data test-results publish verb reads only JSON or YAML — JUnit XML is rejected. Skip the file when not publishing.
Capture stdout and exit code per contract. Non-zero exit means at least one rule failed.

Run sequentially, not in parallel — the warehouse is the bottleneck and parallel runs muddy the log output.

Step 3b — Publish results to Entropy Data (optional)

The platform's Data Quality panel reads test results published via entropy-data test-results publish. Run this step only for output-port contracts — input-port results belong to the upstream provider.

Ask the user — required confirmation gate:

Publish the test results to Entropy Data so they show up in the Data Quality panel? (yes / no)

If no, mark publish as skipped and continue to Step 4. Do not publish without an explicit ask — this writes server-side state visible to all viewers.

If yes, for each output-port contract tested:

uv run entropy-data test-results publish --file ./test-results/<contract>.json

Capture exit code per file. The CLI reads the JSON, infers the contract id and server, and uploads. If a publish fails, surface the CLI error and continue with the rest — don't abort the loop on one failure.

Step 4 — Report

End with this two-part recap. Use the shared Status enum (AGENTS.md § Final-report Status enum). For this skill the relevant statuses are passed, failed, and skipped (missing creds, or user declined publish).

Part 1 — outcome table. One row per contract tested. Group the rows: output-port contracts first, then input-port contracts under a sub-header (so the reader sees the two roles at a glance).

Contract	Role	Server	Result	Failures	Published	Details
`<contract-file>`	`output` / `input`	`<server>`	`passed` / `failed` / `skipped`	count or `—`	`published` / `skipped (user declined)` / `n/a (input port)` / `failed: <error>`	one line per failing rule (field + rule), or "missing env var: …" if skipped

Part 2 — next steps. Bullet list, include only what applies. Treat output vs. input failures differently:

Output-port failures: surface the field and the violated check (e.g. orders.order_id: not_null violated for 17 rows). The fix is in this project — either the @dp.table definition is wrong, the contract is wrong, or the data is wrong. If the user wants a follow-up SQL to find the offending rows, suggest the shape but don't run it. If failures look like they came from a contract edit (rules tightening), point at datacontract-edit to classify breaking-vs-additive.
Input-port failures: this is upstream drift. Name the provider data product and output port (from the contract id and file name). The fix is not in this project — the user should contact the upstream owner, and in the meantime expect downstream output-port failures. Suggest re-running dataproduct-implement once upstream republishes a corrected contract, so the cached snapshot under src/input_ports/ refreshes.
For each skipped row, the exact env vars the user needs to set, and where to get them (usually the workspace admin or entropy-data connection get).
If failures look like a data quality issue (rules unchanged, data drifted), suggest investigating the upstream of the failing model — this skill does not auto-fix data.

If everything passed, write a single line: All <N> contracts pass against <server>.

Authentication examples by server type

The Data Contract CLI reads credentials from environment variables, not from the contract file. Only the connection topology (host, catalog, schema, etc.) belongs in the servers block. The Databricks example below is the primary case for this plugin; other server types follow the same pattern.

Databricks

ODCS server block:

servers:
  production:
    type: databricks
    host: adb-1234567890.7.azuredatabricks.net   # optional, can also come from env
    catalog: acme_catalog_prod
    schema: orders_latest

The datacontract CLI does not share auth state with the databricks CLI — a token must be supplied explicitly via DATACONTRACT_DATABRICKS_TOKEN. When surfacing missing credentials to the user, recommend the OAuth-first path; fall back to PAT only when OAuth isn't available.

Recommended — short-lived OAuth from the already-authenticated databricks CLI:

export DATACONTRACT_DATABRICKS_TOKEN=$(databricks auth token | jq -r .access_token)
export DATACONTRACT_DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/<warehouse-id>

Token is valid ~1h, the literal value never lands in shell history, and a leaked token expires before most attackers notice — much smaller blast radius than a long-lived PAT.

Fallback — Personal Access Token (use when databricks auth token isn't available: PAT-only profile, OAuth refresh issue, headless shell):

export DATACONTRACT_DATABRICKS_TOKEN=dapi...
export DATACONTRACT_DATABRICKS_HTTP_PATH=/sql/1.0/warehouses/<warehouse-id>

A PAT is long-lived until rotated. Scope it narrowly (read access to the data product's schema is enough) and avoid putting the export in .bashrc/.zshrc — it persists in shell history.

CI — use a service-principal-issued token (M2M OAuth, or an SP-owned PAT), not a personal one, with SELECT scoped to the data product's schema. Set as a repository secret named DATACONTRACT_DATABRICKS_TOKEN.

Optional env vars:

export DATACONTRACT_DATABRICKS_SERVER_HOSTNAME=adb-...     # only needed if `host` is not in the server block

HTTP_PATH points at a SQL warehouse — not the Lakeflow pipeline cluster.

Snowflake

ODCS server block:

servers:
  production:
    type: snowflake
    account: abcdefg-xn12345
    database: ORDER_DB
    schema: ORDERS_PII_V2

Any env var prefixed DATACONTRACT_SNOWFLAKE_ is forwarded to the Snowflake connector with the prefix stripped and the rest lowercased.

Password auth

export DATACONTRACT_SNOWFLAKE_USERNAME=...
export DATACONTRACT_SNOWFLAKE_PASSWORD=...
export DATACONTRACT_SNOWFLAKE_WAREHOUSE=COMPUTE_WH
export DATACONTRACT_SNOWFLAKE_ROLE=DATA_CONTRACT_TEST

Private key (JWT) auth — used for service accounts and CI:

export DATACONTRACT_SNOWFLAKE_USERNAME=SVC_DATACONTRACT
export DATACONTRACT_SNOWFLAKE_AUTHENTICATOR=SNOWFLAKE_JWT
export DATACONTRACT_SNOWFLAKE_PRIVATE_KEY_PATH=/secrets/snowflake_rsa.p8
export DATACONTRACT_SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=...   # only if encrypted
export DATACONTRACT_SNOWFLAKE_WAREHOUSE=COMPUTE_WH
export DATACONTRACT_SNOWFLAKE_ROLE=DATA_CONTRACT_TEST

BigQuery

ODCS server block:

servers:
  production:
    type: bigquery
    project: acme-data-prod
    dataset: orders

Service account key file

export DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH=/secrets/bq-sa.json

Application Default Credentials (ADC) — no env vars needed. Used automatically when DATACONTRACT_BIGQUERY_ACCOUNT_INFO_JSON_PATH is unset. Works with gcloud auth application-default login for local runs and with Workload Identity Federation in CI.

Postgres

ODCS server block:

servers:
  production:
    type: postgres
    host: db.example.internal
    port: 5432
    database: analytics
    schema: public

Env vars:

export DATACONTRACT_POSTGRES_USERNAME=datacontract_ro
export DATACONTRACT_POSTGRES_PASSWORD=...

Other server types (Athena, SQL Server, Oracle, MySQL, Trino, DuckDB, Kafka, ...) follow the same DATACONTRACT_<TYPE>_<PARAM> env-var pattern; see the Data Contract CLI README for the full list.

Constraints

Read-only against the warehouse. This skill runs datacontract test which executes SELECT queries; it never writes. Do not invoke datacontract publish, datacontract export, or entropy-data datacontracts put from this skill.
No edits to contracts or models. If a test fails, surface it — do not auto-patch the contract to make it pass. That defeats the purpose.
No credential sourcing. If env vars are missing, tell the user; don't read them from .env, ~/.aws, or anywhere else on the user's behalf.
Idempotent: re-running the skill produces the same report against the same data. Failures from rules that depend on time (freshness, row-count windows) are expected to drift — note that in the failure detail when relevant.

datacontract-test

Invocation

Context Preview

SKILL.md

datacontract-test

Invocation

Context Preview

SKILL.md

Test ODCS data contracts against the live server

When to use this vs. other skills

How to run this skill

Plan announcement (before Step 0)

Step 0 — Pre-checks

Step 1 — Select contracts

Step 2 — Select the server

Step 3 — Run the test

Step 3b — Publish results to Entropy Data (optional)

Step 4 — Report

Authentication examples by server type

Databricks

Snowflake

BigQuery

Postgres

Constraints

Similar Skills

Test ODCS data contracts against the live server

When to use this vs. other skills

How to run this skill

Plan announcement (before Step 0)

Step 0 — Pre-checks

Step 1 — Select contracts

Step 2 — Select the server

Step 3 — Run the test

Step 3b — Publish results to Entropy Data (optional)

Step 4 — Report

Authentication examples by server type

Databricks

Snowflake

BigQuery

Postgres

Constraints

Similar Skills