Skill

scikit-learn-consistency

Write, review, refactor, or debug Python code that uses scikit-learn (sklearn) — Pipeline, ColumnTransformer, StandardScaler, OneHotEncoder, train_test_split, cross_val_score, GridSearchCV, fit/transform/predict — using one canonical, modern idiom set. Use this skill whenever code preprocesses features, trains or evaluates an estimator, tunes hyperparameters, fixes data leakage, migrates off removed sklearn APIs (get_feature_names, OneHotEncoder(sparse=...)), or when the user hits errors like "NotFittedError", "Found unknown categories", or "got an unexpected keyword argument 'sparse'", or asks "why is my test score too good" or "is this the right sklearn way." Trigger it even when the user just says "train a model on this data" or "scale these features" or shows a stack trace mentioning sklearn — without saying "sklearn idioms."

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/scikit-learn-consistency:scikit-learn-consistency

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

scikit-learn is stable and well known to models, yet generated code mixes eras: manual

SKILL.md

123 lines · ~2.1k tokens

Stats

Stars0

MaintenanceGood

Last CommitJun 12, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

scikit-learn — consistent, modern idioms

scikit-learn is stable and well known to models, yet generated code mixes eras: manual fit-the-scaler-on-everything preprocessing next to Pipelines, the removed get_feature_names next to get_feature_names_out, OneHotEncoder(sparse=...) next to sparse_output=, and hand-rolled CV loops next to cross_val_score. Worse, the most common era-mixing bug — fitting preprocessing on the full dataset before splitting — is data leakage that silently inflates every reported score. This skill pins one canonical idiom set: sklearn 1.x, Pipeline-first, leak-free.

Canonical idioms — always X, never Y

Always	Never	Why
`Pipeline`/`make_pipeline` wrapping preprocessing + model	scaler/imputer/encoder fit on the full dataset, then split	Fitting preprocessing before the split leaks test statistics into training — scores are silently optimistic.
`pipe.fit(X_train, y_train)` then `pipe.predict(X_test)`	`scaler.fit_transform(X_test)`	`fit_transform` on test data re-learns parameters from the test set; test must only be `transform`ed.
`ColumnTransformer` for mixed numeric/categorical columns	slicing columns by hand and gluing arrays with `np.hstack`	Column order and names get scrambled silently; CT keeps the mapping and supports `get_feature_names_out`.
`train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)` for classification	unstratified, unseeded splits	Class imbalance skews the split; unseeded runs are unreproducible.
`cross_val_score(pipe, X, y, cv=5)` / `GridSearchCV(pipe, ...)` on the whole pipeline	manual fold loops, or CV over a model fed pre-scaled data	CV over pre-fit preprocessing leaks across folds; the pipeline refits everything per fold.
`OneHotEncoder(handle_unknown="ignore", sparse_output=False)`	`OneHotEncoder(sparse=True)`	`sparse` was renamed `sparse_output` (deprecated 1.2, removed 1.4); without `handle_unknown="ignore"`, unseen categories crash `transform`.
`transformer.get_feature_names_out()`	`transformer.get_feature_names()`	`get_feature_names` was removed in 1.2.
`pipe.set_output(transform="pandas")` when you want DataFrames	wrapping numpy output back into DataFrames by hand	`set_output` keeps column names attached through every step.
`clone(estimator)` for a fresh copy	re-`fit`ting a shared fitted object across experiments	`fit` overwrites learned state; warm-started or shared objects bleed state between runs.
explicit `pos_label=`/`average=` in `f1_score`, `precision_score`, etc.	relying on defaults for multiclass/imbalanced data	The defaults (`pos_label=1`, `average="binary"`) raise or mislead on non-{0,1} or multiclass labels.
`random_state=` on every stochastic estimator/splitter	seeding only `np.random.seed`	Global numpy seeding does not control most sklearn randomness; pass `random_state` explicitly.
`joblib.dump(pipe, path)` + record sklearn version	pickling across unpinned versions	Persisted estimators are not guaranteed to load across sklearn versions.

House style — one pipeline, fit on train only, CV on the whole thing:

from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

numeric = ["age", "income"]
categorical = ["city", "plan"]

preprocess = ColumnTransformer([
    ("num", Pipeline([("impute", SimpleImputer(strategy="median")),
                      ("scale", StandardScaler())]), numeric),
    ("cat", OneHotEncoder(handle_unknown="ignore", sparse_output=False), categorical),
])

pipe = Pipeline([("prep", preprocess),
                 ("clf", LogisticRegression(max_iter=1000, random_state=42))])

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

search = GridSearchCV(pipe, {"clf__C": [0.1, 1.0, 10.0]}, cv=5, scoring="f1")
search.fit(X_train, y_train)
test_score = search.score(X_test, y_test)

Pitfalls that produce silently wrong results

Leakage via pre-split preprocessing: scaling, imputing, encoding, feature selection, or resampling fit on the full data before train_test_split contaminates the test set. Symptom: suspiciously high test scores. Fix: everything learned from data goes inside the Pipeline, fit on train only.
Leakage via pre-CV preprocessing: the same bug one level up — transforming X once and then cross-validating the bare model leaks across folds. Cross-validate the pipeline.
fit_transform on test data re-estimates means/categories from the test set. The contract: fit/fit_transform on train, transform (and predict) everywhere else.
predict vs predict_proba vs decision_function: ROC-AUC and log-loss need scores (predict_proba(X)[:, 1] or decision_function), not hard labels. roc_auc_score(y, pipe.predict(X)) runs without error and quietly reports the wrong number.
predict_proba column order follows pipe.classes_, not your assumption — index the positive class via list(pipe.classes_).index(pos) when in doubt.
Unseeded everything: train_test_split, KFold(shuffle=True), forests, k-means and most solvers are stochastic. Results that can't be reproduced can't be compared.
Accuracy on imbalanced data looks great while the model predicts the majority class. Report a confusion matrix plus f1/average_precision/roc_auc with explicit pos_label/average.
Refitting a fitted estimator in a loop "to compare configs" carries warm-start or attribute state in some estimators; clone() gives a cold, unfitted copy with the same hyperparameters.

Version notes

Target scikit-learn 1.x. Key breaking lines: 1.0 removed long-deprecated positional arguments (keyword-only params); 1.2 removed get_feature_names and deprecated OneHotEncoder(sparse=...) in favor of sparse_output= (the old name removed in 1.4); 1.2 introduced set_output(transform="pandas"). If the user is pinned below 1.2, keep the same leak-free Pipeline style — only the sparse_output/set_output names differ; say so explicitly rather than mixing eras.

Workflow

Split first (train_test_split with random_state, stratify for classification) — before any statistic is computed from the data.
Build a ColumnTransformer + Pipeline holding every learned step (impute → encode/ scale → model). Nothing learned from data lives outside it.
Fit on train; transform/predict only on validation and test.
Tune and evaluate with cross_val_score/GridSearchCV over the pipeline, scoring chosen to match the problem (explicit pos_label/average; probabilities for AUC).
Seed everything (random_state=), then report test-set performance once, at the end.
When reviewing existing code, flag any "Never" pattern above — especially pre-split fit_transform — and rewrite it into the pipeline rather than patching around it.

For the fuller migration map (old API → modern API), leakage taxonomy, metric selection details, and persistence guidance, read references/scikit-learn-patterns.md.

scikit-learn-consistency

Invocation

Context Preview

SKILL.md

scikit-learn-consistency

Invocation

Context Preview

SKILL.md

scikit-learn — consistent, modern idioms

Canonical idioms — always X, never Y

Pitfalls that produce silently wrong results

Version notes

Workflow

Similar Skills

scikit-learn — consistent, modern idioms

Canonical idioms — always X, never Y

Pitfalls that produce silently wrong results

Version notes

Workflow

Similar Skills