The Role of Accuracy in Algorithmic Process Fairness Across Multiple Domains
Michele Albach,James R. Wright
DOI: https://doi.org/10.1145/3465456.3467620
2021-07-18
Abstract:Machine learning is often used to aid in human decision-making, sometimes for life-altering decisions like when determining whether or not to grant bail to a defendant or a loan to an applicant. Because of their importance, it is critical to ensure that the processes used to reach these decisions are considered fair. A common approach is to enforce some fairness constraint over the outcomes of a decision maker, but there is no single, generally-accepted definition of fairness. With notable exceptions, most of the literature on algorithmic fairness takes for granted that there will be an inherent trade-off between accuracy and algorithmic fairness. Additionally, most work focuses only on one or two domains, whereas machine learning techniques are used in an increasing number of distinct decision-making contexts with differing pertinent features. In this work, we consider six different decision-making domains: bail, child protective services, hospital resources, insurance rates, loans, and unemployment aid. We focus on the fairness of the process directly, rather than the outcomes. We also take a descriptive approach, using survey data to elicit the factors that lead a decision-making process to be perceived as fair. Specifically, we ask 2157 Amazon Mechanical Turk workers to rate the features used for algorithmic decision-making in one of the six domains as either fair or unfair, as well as to rate how much they agree or disagree with the assignments of eight previously (and one newly) proposed properties to the features. For example, a worker could be asked to rate the feature of "criminal history" as fair or unfair to use in bail decisions, and then rate how much they agree or disagree that "criminal history" is a reliable feature. We show that, in every domain, disagreements in fairness judgements can be largely explained by the assignments of properties (like reliability) to features (like criminal history). We also show that fairness judgements can be well predicted across domains by training the predictor using the property assignments from one domain's data and predicting in another. These findings imply that the properties act as moral determinants for fairness judgements, and that respondents reason similarly about the implications of the properties in all the decision-making domains that we consider. Although our results are mostly consistent across domains, we find some important differences within specific demographic groups in the hospital and insurance domains, indicating that at least some differences in fairness judgements are introduced by demographic differences. However, a single property usually holds the majority of the predictive power. With some exceptions, predictors learning from only the "increases accuracy" property perform better (in all domains) than predictors learning from any combination of the other seven properties, implying that the primary factor affecting respondents' perceptions of the fairness of using a feature for prediction is whether or not a feature increases the accuracy of the decision being made.