Skill

python3-data

Guides Python data ETL, analysis, and scientific workflows with validation checklists, gotchas tables, decision aids, and modular layouts for pandas, numpy, Polars.

Python

Pydantic

data-engineering

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/python-engineering:python3-data

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Load `python3-core` for standing defaults. Load `python3-typing` for boundary schemas. Load `python3-testing` for parser and edge-case tests.

SKILL.md

51 lines · ~570 tokens

Stats

LanguagePython

Parent stars45

Parent forks8

MaintenanceExcellent

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Python Data

Load python3-core for standing defaults. Load python3-typing for boundary schemas. Load python3-testing for parser and edge-case tests.

Quality Checklist

Schema validated at first stable ingress point — not deep in transforms
dtype= explicit in pd.read_csv() / pd.read_excel() — never rely on inference
No raw pd.DataFrame crossing module boundaries without documented column contract
Merge/join results checked for unexpected nulls and row count changes
model_config = {"strict": True} on all Pydantic boundary models
No inplace=True — deprecated, returns None, causes silent bugs
Notebook logic that survived 3+ uses extracted into tested modules

Gotchas

Trap	What to do instead
`df["a"]["b"] = x` (chained indexing)	`df.loc[:, "b"] = x` — chained indexing silently fails
`.apply(lambda)` on large frames	Vectorized ops first; `.apply()` only when no vectorized path exists
`pd.merge()` without post-check	Assert no unexpected nulls or duplicate keys after merge
`df.drop(..., inplace=True)`	`df = df.drop(...)` — `inplace` is deprecated and returns `None`
Bare `pd.read_csv(path)`	Always pass `dtype=` to prevent silent type inference errors

Decision Table

Task	Use	Not
Tabular < 1M rows	pandas	Polars (overhead not justified)
Tabular > 1M rows or need speed	Polars	pandas
SQL-like analytics on local files	DuckDB	Loading everything into pandas
Read-only TOML config	`tomllib` (stdlib, binary mode `"rb"`)	`tomlkit`
Read/write TOML preserving comments	`tomlkit` (text mode)	`tomllib`

Module Layout

etl/
├── ingest.py      # raw data loading (boundary)
├── validate.py    # schema validation (boundary)
├── transform.py   # business logic (typed core)
├── load.py        # output writing (boundary)
└── types.py       # shared typed models

python3-data

Popularity

Invocation

Context Preview

SKILL.md

python3-data

Popularity

Invocation

Context Preview

SKILL.md

Python Data

Quality Checklist

Gotchas

Decision Table

Module Layout

Similar Skills

Python Data

Quality Checklist

Gotchas

Decision Table

Module Layout

Similar Skills