From majestic-data
Provides Python data validation functions and pipelines for DataFrames using custom checks, Pydantic, Pandera, and Great Expectations. Includes schema evolution and pytest assertions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/majestic-data:data-validationThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Audience:** Data engineers building validation pipelines.
Audience: Data engineers building validation pipelines.
Goal: Provide validation patterns for custom business rules.
Framework-specific skills:
pydantic-validation - Record-level validation with Pydanticpandera-validation - DataFrame schema validationgreat-expectations - Pipeline expectations and monitoringExecute validation functions from scripts/validators.py:
from scripts.validators import (
ValidationResult,
DataValidator,
validate_no_duplicates,
validate_referential_integrity,
validate_date_range,
validate_value_in_set,
run_validation_pipeline,
validate_with_schema_version,
assert_schema_match,
assert_no_nulls,
assert_unique,
assert_values_in_set
)
| Use Case | Framework |
|---|---|
| API request/response | Pydantic |
| Record-by-record ETL | Pydantic |
| DataFrame validation | Pandera |
| Type hints for DataFrames | Pandera |
| Pipeline monitoring | Great Expectations |
| Data warehouse checks | Great Expectations |
| Custom business rules | Custom functions (this skill) |
from scripts.validators import validate_no_duplicates, validate_referential_integrity
# Check duplicates
result = validate_no_duplicates(df, cols=['id'])
if not result.passed:
print(f"Error: {result.message}")
print(result.failed_rows)
# Check referential integrity
result = validate_referential_integrity(df, 'user_id', users_df, 'id')
from scripts.validators import DataValidator, validate_no_duplicates, validate_date_range
validator = DataValidator()
validator.add_check(lambda df: validate_no_duplicates(df, ['id']))
validator.add_check(lambda df: validate_date_range(df, 'created_at', '2020-01-01', '2025-12-31'))
results = validator.validate(df)
if not results['passed']:
for check in results['checks']:
if not check['passed']:
print(f"Failed: {check['message']}")
from scripts.validators import run_validation_pipeline
config = {
'unique_columns': ['id'],
'date_ranges': {
'created_at': ('2020-01-01', '2025-12-31'),
'updated_at': ('2020-01-01', '2025-12-31')
}
}
clean_df, results = run_validation_pipeline(df, config)
from scripts.validators import assert_schema_match, assert_no_nulls, assert_unique
# In pytest
def test_data_quality():
assert_schema_match(df, {'id': 'int64', 'email': 'object'})
assert_no_nulls(df, ['id', 'email'])
assert_unique(df, ['id'])
pandas
npx claudepluginhub majesticlabs-dev/majestic-marketplace --plugin majestic-dataValidates pandas DataFrames using pandera with schema definitions, column checks, decorators, error collection, and schema inference. Ideal for ETL pipelines and data engineering.
Validates data against JSON schemas, business rules, and quality standards including duplicates, anomalies, formats. Generates reports with errors, stats, scores, and fix suggestions.
Validates data quality using Great Expectations, dbt tests, and data contracts for formal rules, expectation suites, checkpoints, and CI/CD pipelines.