Abstract:Background: The presence of bias in artificial intelligence has garnered increased attention, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In health care, the inequitable performance of algorithms across demographic groups may widen health inequalities. Objective: Here, we identify and characterize bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure. Methods: Stage 1 involved a literature search of PubMed and Web of Science for key terms relating to cardiac machine learning (ML) algorithms. Papers that built ML models to predict cardiac disease were evaluated for their focus on demographic bias in model performance, and open-source data sets were retained for our investigation. Two open-source data sets were identified: (1) the University of California Irvine Heart Failure data set and (2) the University of California Irvine Coronary Artery Disease data set. We reproduced existing algorithms that have been reported for these data sets, tested them for sex biases in algorithm performance, and assessed a range of remediation techniques for their efficacy in reducing inequities. Particular attention was paid to the false negative rate (FNR), due to the clinical significance of underdiagnosis and missed opportunities for treatment. Results: In stage 1, our literature search returned 127 papers, with 60 meeting the criteria for a full review and only 3 papers highlighting sex differences in algorithm performance. In the papers that reported sex, there was a consistent underrepresentation of female patients in the data sets. No papers investigated racial or ethnic differences. In stage 2, we reproduced algorithms reported in the literature, achieving mean accuracies of 84.24% (SD 3.51%) for data set 1 and 85.72% (SD 1.75%) for data set 2 (random forest models). For data set 1, the FNR was significantly higher for female patients in 13 out of 16 experiments, meeting the threshold of statistical significance (-17.81% to -3.37%; P<.05). A smaller disparity in the false positive rate was significant for male patients in 13 out of 16 experiments (-0.48% to +9.77%; P<.05). We observed an overprediction of disease for male patients (higher false positive rate) and an underprediction of disease for female patients (higher FNR). Sex differences in feature importance suggest that feature selection needs to be demographically tailored. Conclusions: Our research exposes a significant gap in cardiac ML research, highlighting that the underperformance of algorithms for female patients has been overlooked in the published literature. Our study quantifies sex disparities in algorithmic performance and explores several sources of bias. We found an underrepresentation of female patients in the data sets used to train algorithms, identified sex biases in model error rates, and demonstrated that a series of remediation techniques were unable to address the inequities present.

Demographic Reporting in Publicly Available Chest Radiograph Data Sets: Opportunities for Mitigating Sex and Racial Disparities in Deep Learning Models

Risk of Bias in Chest Radiography Deep Learning Foundation Models

Drop the shortcuts: image augmentation improves fairness and decreases AI detection of race and other demographics from medical images

The Limits of Fair Medical Imaging AI In The Wild

Disparities in the Demographic Composition of The Cancer Imaging Archive

Demographic reporting in biosignal datasets: a comprehensive analysis of the PhysioNet open access database

Medical imaging data science competitions should report dataset demographics and evaluate for bias

CheXclusion: Fairness gaps in deep chest X-ray classifiers

Algorithmic encoding of protected characteristics in image-based models for disease detection

Risk of Training Diagnostic Algorithms on Data with Demographic Bias

Acquisition parameters influence AI recognition of race in chest x-rays and mitigating these factors reduces underdiagnosis bias

Demographic bias in misdiagnosis by computational pathology models

Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis

Fairness in Cardiac Magnetic Resonance Imaging: Assessing Sex and Racial Bias in Deep Learning-Based Segmentation

Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study

Gender and Ethnicity Bias of Text-to-Image Generative Artificial Intelligence in Medical Imaging, Part 2: Analysis of DALL-E 3

Analyzing Racial Differences in Imaging Joint Replacement Registries Using Generative Artificial Intelligence: Advancing Orthopaedic Data Equity

Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging

Assessment of sex and racial biases in electronic health records of emergency department patients with acute coronary syndrome

Deep Learning Discovery of Demographic Biomarkers in Echocardiography

Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center Open Data Commons