The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC Metric

Nathan Kallus,Angela Zhou
DOI: https://doi.org/10.48550/arXiv.1902.05826
2019-06-02
Abstract:Where machine-learned predictive risk scores inform high-stakes decisions, such as bail and sentencing in criminal justice, fairness has been a serious concern. Recent work has characterized the disparate impact that such risk scores can have when used for a binary classification task. This may not account, however, for the more diverse downstream uses of risk scores and their non-binary nature. To better account for this, in this paper, we investigate the fairness of predictive risk scores from the point of view of a bipartite ranking task, where one seeks to rank positive examples higher than negative ones. We introduce the xAUC disparity as a metric to assess the disparate impact of risk scores and define it as the difference in the probabilities of ranking a random positive example from one protected group above a negative one from another group and vice versa. We provide a decomposition of bipartite ranking loss into components that involve the discrepancy and components that involve pure predictive ability within each group. We use xAUC analysis to audit predictive risk scores for recidivism prediction, income prediction, and cardiac arrest prediction, where it describes disparities that are not evident from simply comparing within-group predictive performance.
Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to evaluate the fairness of predictive risk scores in high - risk decision - making, especially in non - binary classification tasks. Specifically, the author focuses on how to evaluate the fairness of predictive risk scores from the perspective of bipartite ranking. Traditional fairness research mainly focuses on binary classification tasks, that is, measuring the differences between different groups through confusion matrix indicators (such as true positive rate and false positive rate). However, these methods may not fully reflect the diversity and non - binary nature of risk scores in practical applications, such as resource allocation decisions in the criminal justice, healthcare, and credit industries. To this end, the paper introduces a new metric - xAUC disparity - to evaluate the ranking differences of risk scores between different protected groups. The xAUC disparity is defined as the difference between the probability that a random positive example in one protected group is correctly ranked by a random negative example in another protected group and the probability of the opposite situation. In addition, the author also provides a decomposition of the bipartite ranking loss, dividing it into components related to differences between groups and components related to pure predictive ability within each group. By using xAUC analysis to audit the risk scores of recidivism prediction, income prediction, and cardiac arrest prediction, the paper reveals differences that cannot be found by only comparing the within - group prediction performance. This indicates that the xAUC disparity can more comprehensively evaluate the fairness issues of risk scores in practical applications, especially in resource allocation decisions.