From rest-api-pipeline
Debugs and inspects dlt pipelines after runs, checking traces, load packages, schemas, data, and errors like missing credentials or failed jobs. Use post-execution.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rest-api-pipeline:debug-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Essential Reading** https://dlthub.com/docs/reference/explainers/how-dlt-works
Essential Reading https://dlthub.com/docs/reference/explainers/how-dlt-works
Parse $ARGUMENTS:
pipeline-name (optional): the dlt pipeline name. If omitted, infer from session context. If ambiguous, ask the user and stop.hints (optional, after --): specific issue to investigateAlways do this first before any pipeline debugging:
IMPORTANT: Before making changes, note the current values in config files and pipeline code so you can restore them exactly. You are changing the user's files — only revert what YOU changed.
Set log level to INFO in .dlt/config.toml:
[runtime]
log_level="INFO"
Show HTTP error response bodies (hidden by default!):
[runtime]
http_show_error_body = true
Add progress logging to the dlt.pipeline() call (NOT pipeline.run() — that argument doesn't exist):
pipeline = dlt.pipeline(..., progress="log")
This shows HTTP requests being made, data extracted, pagination steps, and normalize/load progress. Essential for diagnosing any issue. Essential reading if problems PERSIST: https://dlthub.com/docs/general-usage/http/rest-client.md
uv run python <source>_pipeline.py
Common exceptions and what they mean:
ConfigFieldMissingException - config / secrets are missing. inspect exception messagePipelineFailedException - pipeline failed in one of the steps. inspect exception trace to find a root cause. find load_id to identify load package that failedIn extract step most of the exceptions are coming from source/resource code that you wrote!
Suggest to run the pipeline before asking the user to fill in credentials:
Expected: a ConfigFieldMissingException or 401 Unauthorized error confirming:
Tell the user what credentials to fill in and how to get them. If credentials are unknown, research the data source (web search for API docs, auth setup guides — similar to what find-source does).
After any run (success or failure), use the dlt CLI for inspection:
A pipeline that runs for a long time is suspicious but MAY be normal (large datasets). Analyze stdout before killing it:
Paginator loops forever — repeated requests to the same URL or page:
"paginator" in the resource config.OffsetPaginator/PageNumberPaginator without stop_after_empty_page=True require total_path or maximum_offset/maximum_page, otherwise they loop forever.JSONResponseCursorPaginator with wrong cursor_path → cursor never advances → infinite loop.Silent retries look like a hang — the pipeline may be retrying failed HTTP requests:
.dlt/config.toml for faster failure during debugging:
[runtime]
request_timeout = 15
request_max_attempts = 2
Working but slow — each request returns new data and URL changes. Use .add_limit(N) to cap pages during development.
Can't tell which resource is stuck in a multi-resource pipeline — switch to sequential extraction:
[extract]
next_item_mode = "fifo"
This makes one resource complete fully before the next starts, making logs much easier to follow. Ref: https://dlthub.com/docs/reference/performance.md (extraction modes)
Likely a wrong or missing data_selector. dlt auto-detects the data array in the response but can fail silently on complex/nested responses. Fix: explicitly set data_selector as a JSONPath to the array (e.g. "data", "results.items").
Inspect pipeline state to check the stored cursor value:
dlt pipeline -v <pipeline_name> info
Look for last_value in the resource state — verify it updates between runs. Also check logs for "Bind incremental on <resource_name>" to confirm the incremental param was bound.
Ref: https://dlthub.com/docs/general-usage/incremental/troubleshooting.md
You can inspect last pipeline run:
dlt pipeline -vv <pipeline_name> trace
Note: -vv goes BEFORE the pipeline name. Shows config/secret resolution, step timing, failures.
Each pipeline run generated one or more load packages. Use trace tool to find their ids.
dlt pipeline -v <pipeline_name> load-package # most recent package
dlt pipeline -v <pipeline_name> load-package <load_id> # specific package
Shows package state, per-job details (table, file type, size, timing), and error messages for failed jobs. With -v also shows schema updates applied.
dlt pipeline <pipeline_name> failed-jobs
Scans all packages for failed jobs and displays error messages from the destination.
Load packages are stored at ~/.dlt/pipelines/<pipeline_name>/load/loaded/<load_id>/. Job files live in completed_jobs/ and failed_jobs/ subdirectories.
File format depends on the destination:
| Format | Default for | File extension |
|---|---|---|
| INSERT VALUES | duckdb, postgres, redshift, mssql, motherduck | .insert_values.gz |
| JSONL | bigquery, snowflake, filesystem | .jsonl.gz |
| Parquet | athena, databricks (also supported by duckdb, bigquery, snowflake) | .parquet |
| CSV | filesystem | .csv.gz |
Inspect gzipped files with zcat:
zcat ~/.dlt/pipelines/<pipeline_name>/load/loaded/<load_id>/completed_jobs/<file>.gz
Useful for verifying data transformations and debugging destination errors.
Before moving on, revert all debugging settings YOU introduced. Only revert what you changed — preserve any user settings that existed before.
Checklist:
.dlt/config.toml — restore log_level to its previous value (e.g. WARNING). Remove http_show_error_body, request_timeout, request_max_attempts if you added them. Remove [extract] next_item_mode if you added it.progress="log" from dlt.pipeline() if you added it. Remove .add_limit(N) if you added it for debugging.Do NOT remove settings the user had before you started debugging.
validate-data to inspect schema and data, or hand over to explore-data (data-exploration toolkit) to jump straight into charts and analysiscreate-rest-api-pipeline step 6b for credential setupcreate-rest-api-pipeline to scaffold one firstnpx claudepluginhub dlt-hub/dlthub-ai-workbench --plugin rest-api-pipelineValidates dlt pipeline-loaded schemas and data: mermaid diagrams, dashboard/MCP queries, fixes types (Decimal for money), nested structures, missing columns.
Debug and fix failing drt syncs. Covers auth errors, rate limits, connection timeouts, template errors, and provides step-by-step diagnosis using drt doctor and verbose/dry-run modes.
Verifies ETL/ELT pipeline quality, data contracts, idempotency, and test coverage. Analyzes DAG structure, transformation logic, and data quality checks across dbt, Airflow, Dagster, and Prefect pipelines.