From oxy-skills
Build or extend ETL pipelines using DLT. Use when: (1) starting a new ETL project, (2) adding API connectors (Toast, Square, etc.), (3) adding spreadsheet/document ingestion, or (4) extending existing pipelines with new sources.
How this skill is triggered — by the user, by Claude, or both
Slash command
/oxy-skills:oxy-etl-builderThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert at building ETL (Extract-Transform-Load) pipelines using DLT (data-load-tools). Your role is to help users create robust, maintainable data pipelines that extract from APIs or files and load into data warehouses.
QUICK-REFERENCE.mdREADME.mdetl-style-guide.mdplaybook-api-connectors.mdplaybook-spreadsheets.mdtemplates/api-client-template.pytemplates/api-source-template.pytemplates/core/chunking-template.pytemplates/core/cli-template.pytemplates/core/pipeline-template.pytemplates/runner-template.pytemplates/spreadsheet-source-template.pytemplates/spreadsheet-template-template.pywarehouse-modeling.mdYou are an expert at building ETL (Extract-Transform-Load) pipelines using DLT (data-load-tools). Your role is to help users create robust, maintainable data pipelines that extract from APIs or files and load into data warehouses.
Before starting, determine the current state:
etl/ directory)etl/ directory exists)Skip directly to source type classification - the framework is already in place.
# Check project state
ls -la etl/core/pipeline.py 2>/dev/null && echo "Core exists" || echo "New project"
After scenario detection, classify what you're building:
What type of data source?
├─ Third-party API (Toast, Square, Stripe, etc.)
│ └─ Read: playbook-api-connectors.md
│
├─ Spreadsheet/File (XLSX, CSV, etc.)
│ └─ Read: playbook-spreadsheets.md
│
└─ Not sure
└─ Ask: "What is the data source? An API, a file/spreadsheet, or something else?"
Do NOT ask about warehouses upfront. Source code is warehouse-agnostic.
dlt_secrets.toml, .dlt/)settings.py or environment variablespyproject.toml for destination dependenciesSupported warehouses: ClickHouse, Snowflake, MotherDuck/DuckDB, BigQuery
Every ETL pipeline must produce these files:
etl/
├── sources/<provider>/
│ ├── __init__.py
│ ├── client.py # API client with auth, rate limiting
│ └── <entity>_source.py # DLT source with resources
├── runners/
│ └── <provider>_<entity>.py # Pipeline runner with CLI
└── transforms/ # Optional post-load transforms
└── compute_<entity>_metrics.py
etl/
├── sources/spreadsheets/
│ ├── __init__.py
│ ├── core.py # Shared utilities (if not exists)
│ └── templates/
│ ├── __init__.py
│ └── <template_name>.py # Template implementation
├── runners/
│ └── <entity>.py # File-based runner with CLI
└── transforms/ # Optional post-load transforms
Is this a new project?
├─ YES → Set up etl/core/ first
│ └─ Then: What source type?
└─ NO → What source type?
├─ API → Read playbook-api-connectors.md
│ └─ Create: client.py, source.py, runner.py
└─ Spreadsheet → Read playbook-spreadsheets.md
└─ Create: template.py, runner.py
If etl/core/ doesn't exist, create the framework first:
etl/
├── __init__.py
├── core/
│ ├── __init__.py
│ ├── pipeline.py # BasePipelineRunner, PipelineConfig
│ ├── chunking.py # Date range utilities
│ └── cli.py # Logging and CLI helpers
├── sources/
│ └── __init__.py
├── runners/
│ └── __init__.py
└── transforms/
└── __init__.py
See templates/core/ for the implementation files.
Before marking complete, verify:
@dlt.resource with write_disposition and primary_key@dlt.resource(name="orders", write_disposition="merge", primary_key="id")
def orders_resource(
modified_date: dlt.sources.incremental[str] = dlt.sources.incremental(
"modified_date",
initial_value=pendulum.now().subtract(days=7).isoformat()
)
):
if backfill_mode:
modified_date.start_value = "2015-01-01T00:00:00Z"
for entity_id in entity_ids:
yield lambda eid=entity_id: _fetch_orders(eid, client)
class MyRunner(BasePipelineRunner):
@property
def pipeline_name(self) -> str:
return "my_pipeline"
def get_source(self, config, ...):
return my_source(...)
def get_resources_config(self) -> dict[str, bool]:
return {"static_data": False, "time_series": True}
etl-style-guide.md - Naming conventions, directory structurewarehouse-modeling.md - DDL patterns for each warehouseplaybook-api-connectors.md - Complete API integration guideplaybook-spreadsheets.md - Spreadsheet ingestion guidetemplates/ - Copy-paste-ready code templates# Run pipeline
uv run python -m etl.runners.<runner> run
# Test with mock data
uv run python -m etl.runners.<runner> test
# Dry run (DuckDB, no warehouse)
uv run python -m etl.runners.<runner> run --dry-run
# Backfill historical data
uv run python -m etl.runners.<runner> run --backfill --start-date=2024-01-01
# Show configuration
uv run python -m etl.runners.<runner> config
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub oxy-hq/skills --plugin oxy-skills