From qa-ml-models
Compute group fairness metrics (selection rate, demographic parity, equalized odds) per sensitive feature with `MetricFrame`, then mitigate disparities using Reductions algorithms (`ExponentiatedGradient` with constraint = `DemographicParity`/`EqualizedOdds`). Wire group-disaggregated assertions into the model-evaluation gate.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qa-ml-models:fairlearn-fairnessThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Fairlearn provides *"Metrics - Tools to assess which groups are
Fairlearn provides "Metrics - Tools to assess which groups are
negatively impacted and compare models across fairness and accuracy
dimensions" and "Algorithms - Techniques to mitigate unfairness"
per the Fairlearn quickstart. Two primitives: MetricFrame (group
disaggregation) + Reductions (ExponentiatedGradient,
ThresholdOptimizer).
pip install fairlearn
# OR
conda install -c conda-forge fairlearn
Per the Fairlearn quickstart.
from fairlearn.metrics import MetricFrame
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
classifier.fit(X, y_true)
y_pred = classifier.predict(X)
mf = MetricFrame(
metrics=accuracy_score,
y_true=y_true,
y_pred=y_pred,
sensitive_features=sex,
)
print(mf.by_group)
print(f"Disparity (max-min): {mf.difference()}")
Per the Fairlearn quickstart. sensitive_features can be a Series
or a 2-D array for intersectional analysis (sex × race).
from fairlearn.metrics import selection_rate
sr = MetricFrame(
metrics=selection_rate,
y_true=y_true,
y_pred=y_pred,
sensitive_features=sex,
)
print(sr.by_group)
# Demographic Parity Difference (DPD)
print(f"DPD: {sr.difference()}")
DPD = max group selection rate − min group selection rate. Industry guidance often cites the 80% rule (selection rate ratio ≥ 0.8 between groups) as a soft threshold; consult legal counsel for binding thresholds in your jurisdiction.
from fairlearn.metrics import (
true_positive_rate,
false_positive_rate,
MetricFrame,
)
mf = MetricFrame(
metrics={
"TPR": true_positive_rate,
"FPR": false_positive_rate,
"selection_rate": selection_rate,
},
y_true=y_true,
y_pred=y_pred,
sensitive_features=sex,
)
print(mf.by_group)
Equalized Odds requires both TPR and FPR to be equal across groups - stricter than Demographic Parity.
from fairlearn.reductions import DemographicParity, ExponentiatedGradient
constraint = DemographicParity()
mitigator = ExponentiatedGradient(classifier, constraint)
mitigator.fit(X, y_true, sensitive_features=sex)
y_pred_mitigated = mitigator.predict(X)
Per the Fairlearn quickstart: this approach significantly reduces
selection-rate differences while maintaining accuracy. Other
constraints: EqualizedOdds, TruePositiveRateParity,
FalsePositiveRateParity.
from fairlearn.postprocessing import ThresholdOptimizer
postprocess = ThresholdOptimizer(
estimator=classifier,
constraints="demographic_parity",
prefit=True,
)
postprocess.fit(X, y_true, sensitive_features=sex)
y_pred_pp = postprocess.predict(X, sensitive_features=sex)
Cheaper than retraining; trades model output for per-group threshold adjustment.
def assert_fairness(y_true, y_pred, sensitive, max_dpd=0.10):
sr = MetricFrame(
metrics=selection_rate,
y_true=y_true,
y_pred=y_pred,
sensitive_features=sensitive,
)
dpd = sr.difference()
if dpd > max_dpd:
raise AssertionError(
f"Demographic Parity Difference {dpd:.3f} exceeds budget {max_dpd}"
)
assert_fairness(y_true, y_pred, sex, max_dpd=0.10)
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Compute aggregate accuracy only | Hides group disparities | Always use MetricFrame (Step 2) |
| Choose Demographic Parity for all problems | DP can be inappropriate when base rates legitimately differ across groups | Match constraint to legal/ethical context: DP, EO, EOD, EOP |
| Mitigate via training data resampling alone | Doesn't generalize to new data; brittle | Use Reductions (Step 5) or post-processing (Step 6) |
| Single sensitive attribute (e.g., sex only) | Misses intersectional disparities (Black women) | Pass 2-D sensitive_features for intersection (Step 2) |
| Hard-code 80% rule globally | Not legally binding everywhere; not appropriate for all metrics | Tune max_dpd per use case + legal counsel; use waiver template if scope-exclusion needed |
npx claudepluginhub testland/qa --plugin qa-ml-modelsProvides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.