Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction

Alexandra Kakadiaris
2024-01-01
Abstract:This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the Intensive Care Unit (ICU) length of stay (LOS). Highlighting the critical role of the ICU in managing critically ill patients, the study addresses the growing strain on ICU capacity. It emphasizes the significance of LOS prediction for resource allocation. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. While the XGBoost model performs well overall, disparities across race and insurance attributes reflect the need for tailored assessments and continuous monitoring. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
Machine Learning,Computers and Society
What problem does this paper attempt to address?
The problem this paper attempts to address is the evaluation of the fairness and bias of the MIMIC-IV dataset and its baseline algorithms in predicting the length of stay (LOS) in the Intensive Care Unit (ICU). Specifically, the study focuses on: 1. **Imbalance in the dataset**: The paper reveals the issue of class imbalance in the MIMIC-IV dataset across different demographic attributes such as race, gender, and insurance type. 2. **Differences in model performance**: The paper evaluates the performance of the XGBoost binary classification model across different sensitive attributes such as race and insurance type, finding significant differences in the model's performance among certain groups. 3. **Fairness issues**: The study emphasizes the importance of ensuring fairness in healthcare predictive models, pointing out that the model's predictions may exacerbate health inequalities if fairness and equity are ignored. Through these analyses, the paper aims to uncover potential biases in existing predictive models and propose improvements to ensure fairness and accuracy across different demographic groups.