The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric

Nathan Kallus,Angela Zhou

DOI: https://doi.org/10.48550/arXiv.1902.05826

2019-06-02

Abstract:Where machine-learned predictive risk scores inform high-stakes decisions, such as bail and sentencing in criminal justice, fairness has been a serious concern. Recent work has characterized the disparate impact that such risk scores can have when used for a binary classification task. This may not account, however, for the more diverse downstream uses of risk scores and their non-binary nature. To better account for this, in this paper, we investigate the fairness of predictive risk scores from the point of view of a bipartite ranking task, where one seeks to rank positive examples higher than negative ones. We introduce the xAUC disparity as a metric to assess the disparate impact of risk scores and define it as the difference in the probabilities of ranking a random positive example from one protected group above a negative one from another group and vice versa. We provide a decomposition of bipartite ranking loss into components that involve the discrepancy and components that involve pure predictive ability within each group. We use xAUC analysis to audit predictive risk scores for recidivism prediction, income prediction, and cardiac arrest prediction, where it describes disparities that are not evident from simply comparing within-group predictive performance.

Machine Learning

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to evaluate the fairness of predictive risk scores in high - risk decision - making, especially in non - binary classification tasks. Specifically, the author focuses on how to evaluate the fairness of predictive risk scores from the perspective of bipartite ranking. Traditional fairness research mainly focuses on binary classification tasks, that is, measuring the differences between different groups through confusion matrix indicators (such as true positive rate and false positive rate). However, these methods may not fully reflect the diversity and non - binary nature of risk scores in practical applications, such as resource allocation decisions in the criminal justice, healthcare, and credit industries. To this end, the paper introduces a new metric - xAUC disparity - to evaluate the ranking differences of risk scores between different protected groups. The xAUC disparity is defined as the difference between the probability that a random positive example in one protected group is correctly ranked by a random negative example in another protected group and the probability of the opposite situation. In addition, the author also provides a decomposition of the bipartite ranking loss, dividing it into components related to differences between groups and components related to pure predictive ability within each group. By using xAUC analysis to audit the risk scores of recidivism prediction, income prediction, and cardiac arrest prediction, the paper reveals differences that cannot be found by only comparing the within - group prediction performance. This indicates that the xAUC disparity can more comprehensively evaluate the fairness issues of risk scores in practical applications, especially in resource allocation decisions.

The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric

Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment

On (assessing) the fairness of risk score models

Xorder: A Model Agnostic Post-Processing Framework for Achieving Ranking Fairness While Maintaining Algorithm Utility.

Fairer and more accurate, but for whom?

Counterfactual risk assessments, evaluation, and fairness

Fairness in Risk Assessment Instruments: Post-Processing to Achieve Counterfactual Equalized Odds

Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets

Almost Politically Acceptable Criminal Justice Risk Assessment

Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment

Intrinsic Fairness-Accuracy Tradeoffs under Equalized Odds

Fair prediction with disparate impact: A study of bias in recidivism prediction instruments

Fair and Effective Policing for Neighborhood Safety: Understanding and Overcoming Selection Biases

Risk Scores, Label Bias, and Everything but the Kitchen Sink

Counterfactual Reasoning for Fair Clinical Risk Prediction

Fairness for AUC via Feature Augmentation

Fairness Evaluation in Presence of Biased Noisy Labels

Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility

Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination

Facing the Challenges of Developing Fair Risk Scoring Models