Abstract:Importance: Machine learning has potential to transform cancer care by helping clinicians prioritize patients for serious illness conversations. However, models need to be evaluated for unequal performance across racial groups (ie, racial bias) so that existing racial disparities are not exacerbated. Objective: To evaluate whether racial bias exists in a predictive machine learning model that identifies 180-day cancer mortality risk among patients with solid malignant tumors. Design, setting, and participants: In this cohort study, a machine learning model to predict cancer mortality for patients aged 21 years or older diagnosed with cancer between January 2016 and December 2021 was developed with a random forest algorithm using retrospective data from the Mount Sinai Health System cancer registry, Social Security Death Index, and electronic health records up to the date when databases were accessed for cohort extraction (February 2022). Exposure: Race category. Main outcomes and measures: The primary outcomes were model discriminatory performance (area under the receiver operating characteristic curve [AUROC], F1 score) among each race category (Asian, Black, Native American, White, and other or unknown) and fairness metrics (equal opportunity, equalized odds, and disparate impact) among each pairwise comparison of race categories. True-positive rate ratios represented equal opportunity; both true-positive and false-positive rate ratios, equalized odds; and the percentage of predictive positive rate ratios, disparate impact. All metrics were estimated as a proportion or ratio, with variability captured through 95% CIs. The prespecified criterion for the model's clinical use was a threshold of at least 80% for fairness metrics across different racial groups to ensure the model's prediction would not be biased against any specific race. Results: The test validation dataset included 43 274 patients with balanced demographics. Mean (SD) age was 64.09 (14.26) years, with 49.6% older than 65 years. A total of 53.3% were female; 9.5%, Asian; 18.9%, Black; 0.1%, Native American; 52.2%, White; and 19.2%, other or unknown race; 0.1% had missing race data. A total of 88.9% of patients were alive, and 11.1% were dead. The AUROCs, F1 scores, and fairness metrics maintained reasonable concordance among the racial subgroups: the AUROCs ranged from 0.75 (95% CI, 0.72-0.78) for Asian patients and 0.75 (95% CI, 0.73-0.77) for Black patients to 0.77 (95% CI, 0.75-0.79) for patients with other or unknown race; F1 scores, from 0.32 (95% CI, 0.32-0.33) for White patients to 0.40 (95% CI, 0.39-0.42) for Black patients; equal opportunity ratios, from 0.96 (95% CI, 0.95-0.98) for Black patients compared with White patients to 1.02 (95% CI, 1.00-1.04) for Black patients compared with patients with other or unknown race; equalized odds ratios, from 0.87 (95% CI, 0.85-0.92) for Black patients compared with White patients to 1.16 (1.10-1.21) for Black patients compared with patients with other or unknown race; and disparate impact ratios, from 0.86 (95% CI, 0.82-0.89) for Black patients compared with White patients to 1.17 (95% CI, 1.12-1.22) for Black patients compared with patients with other or unknown race. Conclusions and relevance: In this cohort study, the lack of significant variation in performance or fairness metrics indicated an absence of racial bias, suggesting that the model fairly identified cancer mortality risk across racial groups. It remains essential to consistently review the model's application in clinical settings to ensure equitable patient care.

Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes

Designing Medical Artificial Intelligence for In- and Out-Groups

Fair Machine Learning for Healthcare Requires Recognizing the Intersectionality of Sociodemographic Factors, a Case Study

Mitigating Machine Learning Bias Between High Income and Low-Middle Income Countries for Enhanced Model Fairness and Generalizability

Mitigating machine learning bias between high income and low–middle income countries for enhanced model fairness and generalizability

Algorithmic fairness in artificial intelligence for medicine and healthcare

Algorithm Fairness in AI for Medicine and Healthcare

Can AI Help Reduce Disparities in General Medical and Mental Health Care?

The Limits of Fair Medical Imaging AI In The Wild

Enhancing the fairness of AI prediction models by Quasi-Pareto improvement among heterogeneous thyroid nodule population

Positive-Sum Fairness: Leveraging Demographic Attributes to Achieve Fair AI Outcomes Without Sacrificing Group Gains

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases

Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global review

Fairness in Predicting Cancer Mortality Across Racial Subgroups

Algorithmic encoding of protected characteristics in image-based models for disease detection

An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets

Quantifying Health Inequalities Induced by Data and AI Models

Intersectional consequences for marginal fairness in prediction models of emergency admissions

Identifying and mitigating bias in algorithms used to manage patients in a pandemic

Comparison of Methods to Reduce Bias From Clinical Prediction Models of Postpartum Depression

A translational perspective towards clinical AI fairness