Developing machine learning models for prediction of mortality in the medical intensive care unit

Beatriz Nistal-Nuño
DOI: https://doi.org/10.1016/j.cmpb.2022.106663
Abstract:Background and objective: Alert of patient deterioration is essential for prompt medical intervention in the Medical Intensive Care Unit (MICU). Logistic Regression (LR) has been used for the development of most conventional severity-of-illness scoring systems to anticipate the risk of mortality in the MICU. Machine Learning (ML) models such as probabilistic graphical models and Extreme Gradient Boosting (XGB) have demonstrated improved prediction accuracy in patient outcomes compared to LR. The aim was to compare three ML models to the SAPS, SAPS II, SAPS III, SOFA, serial SOFA, LODS, and OASIS for prediction of MICU mortality. Methods: A Bayesian Network (BN), Naïve Bayes network (NB), and a XGB model were developed. 9893 adult MICU-stays from the MIMIC-III database were studied. The primary outcome was MICU mortality prediction and the secondary outcome was 1-year mortality prediction. Data analyzed consisted on routine physiological measurements collected during 5 hours in the MICU, demographic and diagnoses/procedure features. The performance was evaluated by accuracy statistics, discrimination and calibration measures. Limitations of the study were discussed. Results: The AUROC for MICU mortality prediction was 0.919 for XGB, 0.905 for BN, and 0.864 for NB, while the conventional systems displayed much lower values with the serial SOFA having the best value (0.814). The Diagnostic Odds Ratio was ≤7.099 for all the conventional systems, reaching values of 30.115 for XGB and 22.648 for BN. The XGB achieved a sensitivity of 0.831 and specificity of 0.86 assuring an acceptable precision (0.528), whose values were much lower for the conventional systems. The Brier score was better for the ML models, except for the NB (0.119), with 0.072 for XGB and 0.081 for BN. Conclusions: The XGB and BN substantially outperformed the conventional systems for discrimination, calibration and the accuracy statistics assessed. The NB showed inferior performance to the XGB and BN but improved the discrimination and all accuracy statistics of the conventional systems except for an inferior calibration and 1-year mortality discrimination. The XGB showed the best performance among all models. These ML models have the potential to improve the monitoring of MICU patients, which must be evaluated in future studies.
What problem does this paper attempt to address?