Abstract:A single algorithm drives an important health care decision for over 70 million people in the US. When health systems anticipate that a patient will have especially complex and intensive future health care needs, she is enrolled in a 'care management' program, which provides considerable additional resources: greater attention from trained providers and help with coordination of her care. To determine which patients will have complex future health care needs, and thus benefit from program enrollment, many systems rely on an algorithmically generated commercial risk score. In this paper, we exploit a rich dataset to study racial bias in a commercial algorithm that is deployed nationwide today in many of the US's most prominent Accountable Care Organizations (ACOs). We document significant racial bias in this widely used algorithm, using data on primary care patients at a large hospital. Blacks and whites with the same algorithmic risk scores have very different realized health. For example, the highest-risk black patients (those at the threshold where patients are auto-enrolled in the program), have significantly more chronic illnesses than white enrollees with the same risk score. We use detailed physiological data to show the pervasiveness of the bias: across a range of biomarkers, from HbA1c levels for diabetics to blood pressure control for hypertensives, we find significant racial health gaps conditional on risk score. This bias has significant material consequences for patients: it effectively means that white patients with the same health as black patients are far more likely be enrolled in the care management program, and benefit from its resources. If we simulated a world without this gap in predictions, blacks would be auto-enrolled into the program at more than double the current rate. An unusual aspect of our dataset is that we observe not just the risk scores but also the input data and objective function used to construct it. This provides a unique window into the mechanisms by which bias arises. The algorithm is given a data frame with (1) Yit (label), total medical expenditures ('costs') in year t; and (2) Xi,t--1 (features), fine-grained care utilization data in year t -- 1 (e.g., visits to cardiologists, number of x-rays, etc.). The algorithm's predicted risk of developing complex health needs is thus in fact predicted costs. And by this metric, one could easily call the algorithm unbiased: costs are very similar for black and white patients with the same risk scores. So far, this is inconsistent with algorithmic bias: conditional on risk score, predictions do not favor whites or blacks. The fundamental problem we uncover is that when thinking about 'health care needs,' hospitals and insurers focus on costs. They use an algorithm whose specific objective is cost prediction, and from this perspective, predictions are accurate and unbiased. Yet from the social perspective, actual health -- not just costs -- also matters. This is where the problem arises: costs are not the same as health. While costs are a reasonable proxy for health (the sick do cost more, on average), they are an imperfect one: factors other than health can drive cost -- for example, race. We find that blacks cost more than whites on average; but this gap can be decomposed into two countervailing effects. First, blacks bear a different and larger burden of disease, making them costlier. But this difference in illness is offset by a second factor: blacks cost less, holding constant their exact chronic conditions, a force that dramatically reduces the overall cost gap. Perversely, the fact that blacks cost less than whites conditional on health means an algorithm that predicts costs accurately across racial groups will necessarily also generate biased predictions on health. The root cause of this bias is not in the procedure for prediction, or the underlying data, but the algorithm's objective function itself. This bias is akin to, but distinct from, 'mis-measured labels': it arises here from the choice of labels, not their measurement, which is in turn a consequence of the differing objective functions of private actors in the health sector and society. From the private perspective, the variable they focus on -- cost -- is being appropriately optimized. But our results hint at how algorithms may amplify a fundamental problem in health care as a whole: externalities produced when health care providers focus too narrowly on financial motives, optimizing on costs to the detriment of health. In this sense, our results suggest that a pervasive problem in health care -- incentives that induce health systems to focus on dollars rather than health -- also has consequences for the way algorithms are built and monitored.

Target specification bias, counterfactual prediction, and algorithmic fairness in healthcare

An Empirical Characterization of Fair Machine Learning For Clinical Risk Prediction

Counterfactual Reasoning for Fair Clinical Risk Prediction

Causal Inference and Counterfactual Prediction in Machine Learning for Actionable Healthcare

Racial Bias in Clinical and Population Health Algorithms: A Critical Review of Current Debates

Dissecting Racial Bias in an Algorithm that Guides Health Decisions for 70 Million People

Algorithmic fairness in artificial intelligence for medicine and healthcare

Fairness in Machine Learning Meets with Equity in Healthcare

Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

Intersectional consequences for marginal fairness in prediction models of emergency admissions

Exploring Bias and Prediction Metrics to Characterise the Fairness of Machine Learning for Equity-Centered Public Health Decision-Making: A Narrative Review

Sample Selection Bias in Machine Learning for Healthcare

Algorithmic Bias, Generalist Models,and Clinical Medicine

Algorithmic fairness in computational medicine

Fairness gaps in Machine learning models for hospitalization and emergency department visit risk prediction in home healthcare patients with heart failure

Counterfactual Fairness by Combining Factual and Counterfactual Predictions

Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis

Bias in artificial intelligence algorithms and recommendations for mitigation

Identifying and mitigating bias in algorithms used to manage patients in a pandemic

Algorithm Fairness in AI for Medicine and Healthcare

Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making