Abstract:Objective: Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences. Methods: We created five datasets from Mass General Brigham's electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated. Results: We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (p-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (p = 0.043), in the CKD cohort for insurance type (p = 0.005) and education level (p = 0.016), and in the dementia cohort for body mass index (p = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with p-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and p-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively. Discussion and conclusion: This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.

Towards Quantification of Bias in Machine Learning for Healthcare: A Case Study of Renal Failure Prediction

Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data

Equity in Healthcare: Analyzing Disparities in Machine Learning Predictions of Diabetic Patient Readmissions

Assessing Social Determinants-Related Performance Bias of Machine Learning Models: A case of Hyperchloremia Prediction in ICU Population

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases

Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier

Machine Learning and Bias in Medical Imaging: Opportunities and Challenges

An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets

Fairness gaps in Machine learning models for hospitalization and emergency department visit risk prediction in home healthcare patients with heart failure

Disseminating the Risk Factors With Enhancement in Precision Medicine Using Comparative Machine Learning Models for Healthcare Data

Factors influencing clinician and patient interaction with machine learning-based risk prediction models: a systematic review

Algorithmic fairness and bias mitigation for clinical machine learning with deep reinforcement learning

A machine learning model for predicting, diagnosing, and mitigating health disparities in hospital readmission

Identifying and mitigating bias in algorithms used to manage patients in a pandemic

Fairness in Machine Learning Meets with Equity in Healthcare

Target specification bias, counterfactual prediction, and algorithmic fairness in healthcare

Bias Assessment and Data Drift Detection in Medical Image Analysis: A Survey

The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective

Can AI Help Reduce Disparities in General Medical and Mental Health Care?

Counterfactual Reasoning for Fair Clinical Risk Prediction

Evaluating and Improving the Performance and Racial Fairness of Algorithms for GFR Estimation