From rest-api-pipeline
Validates dlt pipeline-loaded schemas and data: mermaid diagrams, dashboard/MCP queries, fixes types (Decimal for money), nested structures, missing columns.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rest-api-pipeline:validate-dataThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
After a successful pipeline load, verify the schema and data make sense. Fix data types, nested structures, and missing columns as needed.
After a successful pipeline load, verify the schema and data make sense. Fix data types, nested structures, and missing columns as needed.
Parse $ARGUMENTS:
pipeline-name (optional): the dlt pipeline name. If omitted, infer from session context. If ambiguous, ask the user and stop.hints (optional, after --): specific validation concernsdlt pipeline <pipeline_name> schema --format mermaid
Show the mermaid diagram to the user. This gives a quick overview of tables, columns, types, and relationships (parent/child).
Tell the user to run Workspace Dashboard:
dlt pipeline <pipeline_name> show
This opens a browser with table schemas, row counts, and sample data.
You have mcp with a right set of tools available
Ask the user if the schema and data look right. Common issues to address:
Use processing_steps in the resource config to transform data before loading. Available steps: map, filter, yield_map.
"processing_steps": [
{"map": lambda item: {**item, "amount": Decimal(item["amount"])}},
]
IMPORTANT: NEVER convert monetary amounts or precision-sensitive values to float. Always use Decimal.
dlt auto-unnests nested arrays into child tables (e.g., results inside a response becomes <resource>__results). This is often fine for analytics. If the user wants a flat structure, use yield_map to flatten, or adjust data_selector to point deeper into the response.
Columns that are all-null on first load won't have inferred types. Options:
columns hints to the resource config: "columns": {"field": {"data_type": "text"}}group_by or other API params to populate the columnsRe-run the pipeline after changes (dev_mode gives a fresh dataset each time). Use debug-pipeline to inspect traces and load packages after each run. Inspect again with MCP or dlt pipeline <name> schema --format mermaid. Repeat until the user is happy with the schema.
new-endpoint for more resources, view-data for querying, or the data-exploration toolkit for interactive notebooks and reportsdebug-pipelinenpx claudepluginhub dlt-hub/dlthub-ai-workbench --plugin rest-api-pipelineConnects to dlt pipelines, profiles tables, scans schemas, plans charts with ibis and altair, and outputs analysis_plan.md artifacts for data exploration and analysis.
Debugs and inspects dlt pipelines after runs, checking traces, load packages, schemas, data, and errors like missing credentials or failed jobs. Use post-execution.
Provides Python data validation functions and pipelines for DataFrames using custom checks, Pydantic, Pandera, and Great Expectations. Includes schema evolution and pytest assertions.