Abstract:Background: The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited. Objective: This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics. Methods: We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare & Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations. Results: The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations. Conclusions: Caution must be taken when interpreting fairness measures' face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities.

The Measurement and Mitigation of Algorithmic Bias and Unfairness in Healthcare AI Models Developed for the CMS AI Health Outcomes Challenge

Algorithm Fairness in AI for Medicine and Healthcare

Algorithmic fairness in artificial intelligence for medicine and healthcare

Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis

Algorithmic fairness in computational medicine

Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models

Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models

Racial Bias in Clinical and Population Health Algorithms: A Critical Review of Current Debates

An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets

Bias in artificial intelligence algorithms and recommendations for mitigation

Guiding Principles to Address the Impact of Algorithm Bias on Racial and Ethnic Disparities in Health and Health Care

Addressing Algorithmic Bias and the Perpetuation of Health Inequities: An AI Bias Aware Framework

AI-Driven Healthcare: A Survey on Ensuring Fairness and Mitigating Bias

Metrics and methods for a systematic comparison of fairness-aware machine learning algorithms

Practical, epistemic and normative implications of algorithmic bias in healthcare artificial intelligence: a qualitative study of multidisciplinary expert perspectives

Identifying and mitigating bias in algorithms used to manage patients in a pandemic

Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment

Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning

A Survey on Optimization and Machine Learning-Based Fair Decision Making in Healthcare