From python-engineering
Guides Python data ETL, analysis, and scientific workflows with validation checklists, gotchas tables, decision aids, and modular layouts for pandas, numpy, Polars.
How this skill is triggered — by the user, by Claude, or both
Slash command
/python-engineering:python3-dataThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Load `python3-core` for standing defaults. Load `python3-typing` for boundary schemas. Load `python3-testing` for parser and edge-case tests.
Load python3-core for standing defaults. Load python3-typing for boundary schemas. Load python3-testing for parser and edge-case tests.
dtype= explicit in pd.read_csv() / pd.read_excel() — never rely on inferencepd.DataFrame crossing module boundaries without documented column contractmodel_config = {"strict": True} on all Pydantic boundary modelsinplace=True — deprecated, returns None, causes silent bugs| Trap | What to do instead |
|---|---|
df["a"]["b"] = x (chained indexing) | df.loc[:, "b"] = x — chained indexing silently fails |
.apply(lambda) on large frames | Vectorized ops first; .apply() only when no vectorized path exists |
pd.merge() without post-check | Assert no unexpected nulls or duplicate keys after merge |
df.drop(..., inplace=True) | df = df.drop(...) — inplace is deprecated and returns None |
Bare pd.read_csv(path) | Always pass dtype= to prevent silent type inference errors |
| Task | Use | Not |
|---|---|---|
| Tabular < 1M rows | pandas | Polars (overhead not justified) |
| Tabular > 1M rows or need speed | Polars | pandas |
| SQL-like analytics on local files | DuckDB | Loading everything into pandas |
| Read-only TOML config | tomllib (stdlib, binary mode "rb") | tomlkit |
| Read/write TOML preserving comments | tomlkit (text mode) | tomllib |
etl/
├── ingest.py # raw data loading (boundary)
├── validate.py # schema validation (boundary)
├── transform.py # business logic (typed core)
├── load.py # output writing (boundary)
└── types.py # shared typed models
npx claudepluginhub jamie-bitflight/claude_skills --plugin python-engineeringProvides pandas API patterns for DataFrame operations, data cleaning, aggregation, merging, and performance optimization. Useful for generating pandas code in data loading, manipulation, or profiling workflows.
Validates pandas DataFrames using pandera with schema definitions, column checks, decorators, error collection, and schema inference. Ideal for ETL pipelines and data engineering.
Performs pandas DataFrame operations including data cleaning, aggregation, merging, and time series analysis with production-grade patterns and validation.