Skill

ssmd-dq-run

How to run ssmd DQ checks locally and in-cluster, interpret scores, trigger email reports, and verify results. Use when running data quality checks, re-sending DQ emails, or verifying pipeline health after deployments or backfills.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/dlaw:ssmd-dq-run

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Procedures for running ssmd Data Quality checks and interpreting results.

SKILL.md

152 lines · ~1.2k tokens

Stats

Stars1

MaintenanceExcellent

Last CommitApr 4, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

ssmd-dq-run

Procedures for running ssmd Data Quality checks and interpreting results.

Source Files

File	Purpose
`data/dq.py`	DQRunner engine — 13 checks, scoring, CLI
`data/dq_email.py`	Email report wrapper — runs all feeds, HTML output
`data/Dockerfile`	DQ image: python:3.12-slim + duckdb + gcloud monitoring

Running DQ Locally

Requires gcloud auth application-default login for GCS access.

# Single feed
uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto

# With verbose progress
uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto --verbose

# JSON output (for programmatic use)
uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto --json

# Non-default prefix (when GCS prefix differs from feed name)
uv run data/dq.py --date 2026-02-17 --feed kraken-futures --stream futures --prefix kraken-futures
uv run data/dq.py --date 2026-02-17 --feed polymarket --stream markets --prefix polymarket

All Three Feeds

Run all feeds in parallel for full pipeline verification:

uv run data/dq.py --date 2026-02-17 --feed kalshi --stream crypto
uv run data/dq.py --date 2026-02-17 --feed kraken-futures --stream futures --prefix kraken-futures
uv run data/dq.py --date 2026-02-17 --feed polymarket --stream markets --prefix polymarket

Feed Parameters

Feed	`--feed`	`--stream`	`--prefix`
Kalshi	`kalshi`	`crypto`	(default: `kalshi`)
Kraken Futures	`kraken-futures`	`futures`	`kraken-futures`
Polymarket	`polymarket`	`markets`	`polymarket`

Running DQ In-Cluster

The DQ CronJob runs at 03:30 UTC daily (after parquet-gen at 02:00 UTC).

Manifest: clusters/gke-prod/apps/ssmd/cronjobs/dq-daily.yaml

Trigger a manual DQ email run

kubectl create job --from=cronjob/ssmd-dq-daily ssmd-dq-manual-MMDD -n ssmd

Watch progress

kubectl logs -n ssmd job/ssmd-dq-manual-MMDD -f

Re-run for a specific date

The CronJob defaults to yesterday. To override:

kubectl create job --from=cronjob/ssmd-dq-daily ssmd-dq-rerun-MMDD -n ssmd --dry-run=client -o yaml | \
  sed 's|dq_email.py|dq_email.py --date 2026-02-17|' | \
  kubectl apply -f -

Interpreting Scores

Grades

Grade	Score Range	Meaning
GREEN	>= 98	Pipeline healthy, all checks passing
YELLOW	>= 85	Minor issues, investigate when convenient
RED	< 85	Significant issues, investigate promptly

Check Statuses

Status	Weight	Meaning
pass	1.0	Check passed
warn	0.7	Threshold exceeded but not critical
fail	0.0	Check failed
skip	excluded	Not enough data to run, excluded from score

Score = average of weights * 100.

Exit Codes

dq.py exits 1 if any check has status fail
dq_email.py always exits 0 (email is the alert mechanism)

Notebook / Programmatic Usage

from dq import DQRunner

runner = DQRunner(bucket="ssmd-data", feed="kalshi", stream="crypto")
results = runner.run("2026-02-12")
results.summary()       # print human-readable report
results.score()         # float 0-100
results.to_json()       # JSON string

# Ad-hoc queries via the shared DuckDB connection
runner.con.execute(
    "SELECT * FROM read_parquet('gcs://ssmd-data/kalshi/crypto/2026-02-12/ticker_*.parquet') LIMIT 10"
).fetchdf()

# Date range
all_results = runner.run_range("2026-02-10", "2026-02-17")

Email Report

dq_email.py runs all 3 feeds, generates an HTML email with per-feed grades and check details, and sends via SMTP.

Required env vars: SMTP_USER, SMTP_PASS, SMTP_TO Optional: SMTP_HOST (default: smtp.gmail.com), SMTP_PORT (default: 587)

These are provided in-cluster via the ssmd-smtp-credentials Secret.

Post-Deploy / Post-Backfill Verification

After deploying a new DQ version or backfilling parquet data:

Run DQ locally for all 3 feeds (see commands above)
Verify target checks show PASS
Optionally trigger in-cluster email: kubectl create job --from=cronjob/ssmd-dq-daily ...
Verify email arrives with corrected scores

Image Build

DQ image is built from data/Dockerfile, triggered by dq-v* tags in the 899bushwick repo (not ssmd).

See the ssmd-deploy skill for full deployment procedure.

ssmd-dq-run

Popularity

Invocation

Context Preview

SKILL.md

ssmd-dq-run

Popularity

Invocation

Context Preview

SKILL.md

ssmd-dq-run

Source Files

Running DQ Locally

All Three Feeds

Feed Parameters

Running DQ In-Cluster

Trigger a manual DQ email run

Watch progress

Re-run for a specific date

Interpreting Scores

Grades

Check Statuses

Exit Codes

Notebook / Programmatic Usage

Email Report

Post-Deploy / Post-Backfill Verification

Image Build

Similar Skills

ssmd-dq-run

Source Files

Running DQ Locally

All Three Feeds

Feed Parameters

Running DQ In-Cluster

Trigger a manual DQ email run

Watch progress

Re-run for a specific date

Interpreting Scores

Grades

Check Statuses

Exit Codes

Notebook / Programmatic Usage

Email Report

Post-Deploy / Post-Backfill Verification

Image Build

Similar Skills