Abstract:Importance: Machine learning has potential to transform cancer care by helping clinicians prioritize patients for serious illness conversations. However, models need to be evaluated for unequal performance across racial groups (ie, racial bias) so that existing racial disparities are not exacerbated. Objective: To evaluate whether racial bias exists in a predictive machine learning model that identifies 180-day cancer mortality risk among patients with solid malignant tumors. Design, setting, and participants: In this cohort study, a machine learning model to predict cancer mortality for patients aged 21 years or older diagnosed with cancer between January 2016 and December 2021 was developed with a random forest algorithm using retrospective data from the Mount Sinai Health System cancer registry, Social Security Death Index, and electronic health records up to the date when databases were accessed for cohort extraction (February 2022). Exposure: Race category. Main outcomes and measures: The primary outcomes were model discriminatory performance (area under the receiver operating characteristic curve [AUROC], F1 score) among each race category (Asian, Black, Native American, White, and other or unknown) and fairness metrics (equal opportunity, equalized odds, and disparate impact) among each pairwise comparison of race categories. True-positive rate ratios represented equal opportunity; both true-positive and false-positive rate ratios, equalized odds; and the percentage of predictive positive rate ratios, disparate impact. All metrics were estimated as a proportion or ratio, with variability captured through 95% CIs. The prespecified criterion for the model's clinical use was a threshold of at least 80% for fairness metrics across different racial groups to ensure the model's prediction would not be biased against any specific race. Results: The test validation dataset included 43 274 patients with balanced demographics. Mean (SD) age was 64.09 (14.26) years, with 49.6% older than 65 years. A total of 53.3% were female; 9.5%, Asian; 18.9%, Black; 0.1%, Native American; 52.2%, White; and 19.2%, other or unknown race; 0.1% had missing race data. A total of 88.9% of patients were alive, and 11.1% were dead. The AUROCs, F1 scores, and fairness metrics maintained reasonable concordance among the racial subgroups: the AUROCs ranged from 0.75 (95% CI, 0.72-0.78) for Asian patients and 0.75 (95% CI, 0.73-0.77) for Black patients to 0.77 (95% CI, 0.75-0.79) for patients with other or unknown race; F1 scores, from 0.32 (95% CI, 0.32-0.33) for White patients to 0.40 (95% CI, 0.39-0.42) for Black patients; equal opportunity ratios, from 0.96 (95% CI, 0.95-0.98) for Black patients compared with White patients to 1.02 (95% CI, 1.00-1.04) for Black patients compared with patients with other or unknown race; equalized odds ratios, from 0.87 (95% CI, 0.85-0.92) for Black patients compared with White patients to 1.16 (1.10-1.21) for Black patients compared with patients with other or unknown race; and disparate impact ratios, from 0.86 (95% CI, 0.82-0.89) for Black patients compared with White patients to 1.17 (95% CI, 1.12-1.22) for Black patients compared with patients with other or unknown race. Conclusions and relevance: In this cohort study, the lack of significant variation in performance or fairness metrics indicated an absence of racial bias, suggesting that the model fairly identified cancer mortality risk across racial groups. It remains essential to consistently review the model's application in clinical settings to ensure equitable patient care.

Subpopulation-specific machine learning prognosis for underrepresented patients with double prioritized bias correction

Fairness in Predicting Cancer Mortality Across Racial Subgroups

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases

Demographic bias in misdiagnosis by computational pathology models

Optimizing the fairness of survival prediction models for racial/ethnic subgroups: A study on predicting post-operative survival in stage IA and IB non-small cell lung cancer.

Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations

B-111 Advancing Precision Medicine in Multiple Myeloma: Addressing Demographic Variabilities and Imbalanced Data in the NIH All of Us Research Program Cohort

Racial and Ethnic Bias in Risk Prediction Models for Colorectal Cancer Recurrence When Race and Ethnicity Are Omitted as Predictors

De-Biased Disentanglement Learning for Pulmonary Embolism Survival Prediction on Multimodal Data

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier

Multi-task learning with dynamic re-weighting to achieve fairness in healthcare predictive modeling

Enhancing Fairness and Accuracy in Diagnosing Type 2 Diabetes in Young Population

Identifying racial disparities in adverse dispositions following major surgery for prostate cancer using machine learning.

Addressing bias in prediction models by improving subpopulation calibration

A comparison of approaches to improve worst-case predictive model performance over patient subpopulations

Abstract 4800: Reducing health disparities for prostate adenocarcinoma by integrating multi-omics data via a multi-modal transfer learning approach

Intersectional consequences for marginal fairness in prediction models of emergency admissions

Improving lung cancer health equity by applying deep learning to low dose CT screening of minority and disadvantaged patients.

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience

The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective

Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning