From majestic-data
Validates pandas DataFrames using pandera with schema definitions, column checks, decorators, error collection, and schema inference. Ideal for ETL pipelines and data engineering.
How this skill is triggered — by the user, by Claude, or both
Slash command
/majestic-data:pandera-validationThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Audience:** Data engineers validating pandas DataFrames.
Audience: Data engineers validating pandas DataFrames.
Goal: Provide pandera patterns for schema validation and type checking.
Execute schema functions from scripts/schemas.py:
from scripts.schemas import (
create_user_schema,
create_nullable_schema,
create_date_range_schema,
UserSchema,
validate_with_errors,
infer_and_export_schema
)
from scripts.schemas import create_user_schema
schema = create_user_schema()
validated_df = schema.validate(df)
from scripts.schemas import create_user_schema, validate_with_errors
schema = create_user_schema()
validated_df, errors = validate_with_errors(df, schema)
if errors:
for err in errors:
print(f"{err['column']}: {err['check']} - {err['failure_case']}")
from scripts.schemas import UserSchema
# Validate with type hints
UserSchema.validate(df)
# Use as function type hint
def process_users(df: pa.typing.DataFrame[UserSchema]) -> pd.DataFrame:
return df.query("status == 'active'")
from scripts.schemas import infer_and_export_schema
schema_export = infer_and_export_schema(df)
print(schema_export['python_code']) # Python schema definition
print(schema_export['yaml']) # YAML schema
| Check Type | Example | Description |
|---|---|---|
| Numeric | Check.gt(0), Check.in_range(0, 100) | Comparisons |
| String | Check.str_matches(r'pattern') | Regex match |
| Set membership | Check.isin(['A', 'B']) | Allowed values |
| Uniqueness | unique=True on Column | No duplicates |
| Nullable | nullable=True on Column | Allow nulls |
import pandera as pa
@pa.check_output(schema)
def load_data(path: str) -> pd.DataFrame:
return pd.read_csv(path)
@pa.check_input(schema, "df")
def process_data(df: pd.DataFrame) -> pd.DataFrame:
return df.assign(processed=True)
@pa.check_io(df=input_schema, out=output_schema)
def transform_data(df: pd.DataFrame) -> pd.DataFrame:
return df.transform(...)
| Use Case | Pandera | Alternative |
|---|---|---|
| DataFrame validation | ✓ | - |
| Type hints for DataFrames | ✓ | - |
| ETL pipeline checks | ✓ | Great Expectations |
| Record-level validation | - | Pydantic |
pandera>=0.18
pandas
npx claudepluginhub majesticlabs-dev/majestic-marketplace --plugin majestic-dataProvides Python data validation functions and pipelines for DataFrames using custom checks, Pydantic, Pandera, and Great Expectations. Includes schema evolution and pytest assertions.
Validates data quality using Great Expectations, dbt tests, and data contracts for formal rules, expectation suites, checkpoints, and CI/CD pipelines.
Generates data quality check code for bauplan pipelines (expectations.py with @bauplan.expectation) and ingestion workflows (validate_import() in WAP scripts). Uses Polars, never Pandas.