Profile PySpark DataFrames or Unity Catalog tables with AI to generate data quality rule candidates, define rules via Python classes or YAML, validate against DQEngine, run end-to-end checks splitting valid/quarantined rows, and persist rules to Delta tables, volumes, or Lakebase.
Create DQX quality rules (checks) for a PySpark DataFrame or Delta table. Use when the user asks to "add a DQX check", "define a data quality rule", "validate that column X is not null / unique / in a set", or wants checks expressed in YAML/JSON for storage. Covers DQRowRule, DQDatasetRule, DQForEachColRule, built-in check_funcs, filters, user_metadata, custom SQL/Python checks, and the declarative metadata form.
Validate a PySpark DataFrame or Delta table against a set of DQX quality rules using DQEngine. Use when the user asks to "run data quality checks", "apply DQX rules to a DataFrame/table", "split valid and invalid rows", "quarantine bad records", or "integrate DQX into a streaming pipeline". Covers apply_checks, apply_checks_and_split, the by_metadata variants, and the shape of the result columns.
Run DQX validation end-to-end — read an input table or path, apply checks, and write valid and quarantined rows to output locations — in a single call. Use when the user asks for "apply and save", "quality-check a table and split the output", "DQX on a whole table", "save valid and invalid rows", or wants to drop DQX into a Lakeflow / workflow that runs on a table or path. Covers apply_checks_and_save_in_table, the by_metadata variant, InputConfig / OutputConfig, and incremental streaming mode.
Profile a DataFrame or table and generate DQX quality rule candidates with summary statistics. Use when the user asks to "profile a table", "generate DQX rules from data", "suggest data quality checks", "bootstrap a checks.yml", or "generate DLT expectations". Covers DQProfiler, DQGenerator, DQDltGenerator, the profiler workflow, sampling / filter options, and AI-assisted variants.
Load and save DQX checks (quality rules) to a file, workspace path, Unity Catalog volume, Delta table, Lakebase, or the DQX installation folder. Use when the user asks to "load DQX checks from YAML", "save checks to a Delta table", "read checks from a volume", "share checks across notebooks", or "use the DQX workspace install's default checks location". Covers every *ChecksStorageConfig and the matching load/save calls.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Simplified Data Quality checking at Scale for PySpark Workloads on streaming and standard DataFrames.
The complete documentation is available at: https://databrickslabs.github.io/dqx/
Please see the contribution guidance here on how to contribute to the project (build, test, and submit a PR).
Please note that this project is provided for your exploration only and is not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS, and we do not make any guarantees. Please do not submit a support ticket relating to any issues arising from the use of this project.
Any issues discovered through the use of this project should be filed as GitHub Issues on this repository. They will be reviewed as time permits, but no formal SLAs for support exist.
npx claudepluginhub databrickslabs/dqx --plugin dqxDatabricks development toolkit with skills for data engineering, ML, and AI agents plus MCP tools for direct Databricks operations
Skills for working with Bauplan data lakehouses. Includes data exploration, pipeline creation, safe S3 ingestion, pipeline debugging, data assessment, and quality check generation.
Spec-Driven Development framework for Data Engineering — 58 agents, 24 KB domains, 5-phase SDD workflow, 31 commands
Claude Code skill pack for Databricks (24 skills)
DataHub development and interaction toolkit with connector planning, PR review, catalog search, metadata enrichment, lineage tracing, data quality management, and connection setup skills
Databricks skills for CLI, Apps, Unity Catalog, Model Serving, Declarative Automation Bundles (DABs), and more.