Write, review, refactor, or debug Python code that uses scikit-learn (sklearn) — Pipeline, ColumnTransformer, StandardScaler, OneHotEncoder, train_test_split, cross_val_score, GridSearchCV, fit/transform/predict — using one canonical, modern idiom set. Use this skill whenever code preprocesses features, trains or evaluates an estimator, tunes hyperparameters, fixes data leakage, migrates off removed sklearn APIs (get_feature_names, OneHotEncoder(sparse=...)), or when the user hits errors like "NotFittedError", "Found unknown categories", or "got an unexpected keyword argument 'sparse'", or asks "why is my test score too good" or "is this the right sklearn way." Trigger it even when the user just says "train a model on this data" or "scale these features" or shows a stack trace mentioning sklearn — without saying "sklearn idioms."
How this skill is triggered — by the user, by Claude, or both
Slash command
/scikit-learn-consistency:scikit-learn-consistencyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
scikit-learn is stable and well known to models, yet generated code mixes eras: manual
scikit-learn is stable and well known to models, yet generated code mixes eras: manual
fit-the-scaler-on-everything preprocessing next to Pipelines, the removed
get_feature_names next to get_feature_names_out, OneHotEncoder(sparse=...) next to
sparse_output=, and hand-rolled CV loops next to cross_val_score. Worse, the most
common era-mixing bug — fitting preprocessing on the full dataset before splitting — is
data leakage that silently inflates every reported score. This skill pins one canonical
idiom set: sklearn 1.x, Pipeline-first, leak-free.
| Always | Never | Why |
|---|---|---|
Pipeline/make_pipeline wrapping preprocessing + model | scaler/imputer/encoder fit on the full dataset, then split | Fitting preprocessing before the split leaks test statistics into training — scores are silently optimistic. |
pipe.fit(X_train, y_train) then pipe.predict(X_test) | scaler.fit_transform(X_test) | fit_transform on test data re-learns parameters from the test set; test must only be transformed. |
ColumnTransformer for mixed numeric/categorical columns | slicing columns by hand and gluing arrays with np.hstack | Column order and names get scrambled silently; CT keeps the mapping and supports get_feature_names_out. |
train_test_split(X, y, test_size=0.2, random_state=42, stratify=y) for classification | unstratified, unseeded splits | Class imbalance skews the split; unseeded runs are unreproducible. |
cross_val_score(pipe, X, y, cv=5) / GridSearchCV(pipe, ...) on the whole pipeline | manual fold loops, or CV over a model fed pre-scaled data | CV over pre-fit preprocessing leaks across folds; the pipeline refits everything per fold. |
OneHotEncoder(handle_unknown="ignore", sparse_output=False) | OneHotEncoder(sparse=True) | sparse was renamed sparse_output (deprecated 1.2, removed 1.4); without handle_unknown="ignore", unseen categories crash transform. |
transformer.get_feature_names_out() | transformer.get_feature_names() | get_feature_names was removed in 1.2. |
pipe.set_output(transform="pandas") when you want DataFrames | wrapping numpy output back into DataFrames by hand | set_output keeps column names attached through every step. |
clone(estimator) for a fresh copy | re-fitting a shared fitted object across experiments | fit overwrites learned state; warm-started or shared objects bleed state between runs. |
explicit pos_label=/average= in f1_score, precision_score, etc. | relying on defaults for multiclass/imbalanced data | The defaults (pos_label=1, average="binary") raise or mislead on non-{0,1} or multiclass labels. |
random_state= on every stochastic estimator/splitter | seeding only np.random.seed | Global numpy seeding does not control most sklearn randomness; pass random_state explicitly. |
joblib.dump(pipe, path) + record sklearn version | pickling across unpinned versions | Persisted estimators are not guaranteed to load across sklearn versions. |
House style — one pipeline, fit on train only, CV on the whole thing:
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
numeric = ["age", "income"]
categorical = ["city", "plan"]
preprocess = ColumnTransformer([
("num", Pipeline([("impute", SimpleImputer(strategy="median")),
("scale", StandardScaler())]), numeric),
("cat", OneHotEncoder(handle_unknown="ignore", sparse_output=False), categorical),
])
pipe = Pipeline([("prep", preprocess),
("clf", LogisticRegression(max_iter=1000, random_state=42))])
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
search = GridSearchCV(pipe, {"clf__C": [0.1, 1.0, 10.0]}, cv=5, scoring="f1")
search.fit(X_train, y_train)
test_score = search.score(X_test, y_test)
train_test_split contaminates the test set.
Symptom: suspiciously high test scores. Fix: everything learned from data goes inside the
Pipeline, fit on train only.X once
and then cross-validating the bare model leaks across folds. Cross-validate the pipeline.fit_transform on test data re-estimates means/categories from the test set. The
contract: fit/fit_transform on train, transform (and predict) everywhere else.predict vs predict_proba vs decision_function: ROC-AUC and log-loss need scores
(predict_proba(X)[:, 1] or decision_function), not hard labels. roc_auc_score(y, pipe.predict(X)) runs without error and quietly reports the wrong number.predict_proba column order follows pipe.classes_, not your assumption — index the
positive class via list(pipe.classes_).index(pos) when in doubt.train_test_split, KFold(shuffle=True), forests, k-means and
most solvers are stochastic. Results that can't be reproduced can't be compared.f1/average_precision/roc_auc with explicit
pos_label/average.clone() gives a cold, unfitted copy with the same
hyperparameters.Target scikit-learn 1.x. Key breaking lines: 1.0 removed long-deprecated positional
arguments (keyword-only params); 1.2 removed get_feature_names and deprecated
OneHotEncoder(sparse=...) in favor of sparse_output= (the old name removed in 1.4); 1.2
introduced set_output(transform="pandas"). If the user is pinned below 1.2, keep the same
leak-free Pipeline style — only the sparse_output/set_output names differ; say so
explicitly rather than mixing eras.
train_test_split with random_state, stratify for classification)
— before any statistic is computed from the data.cross_val_score/GridSearchCV over the pipeline, scoring
chosen to match the problem (explicit pos_label/average; probabilities for AUC).random_state=), then report test-set performance once, at the end.fit_transform — and rewrite it into the pipeline rather than patching around it.For the fuller migration map (old API → modern API), leakage taxonomy, metric selection
details, and persistence guidance, read references/scikit-learn-patterns.md.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub guidogl/scikit-learn-consistency --plugin scikit-learn-consistency