A novel higher performance nomogram based on explainable machine learning for predicting mortality risk in stroke patients within 30 days based on clinical features on the first day ICU admission

Haoran Chen,Fengchun Yang,Yifan Duan,Lin Yang,Jiao Li
DOI: https://doi.org/10.1186/s12911-024-02547-7
IF: 3.298
2024-06-10
BMC Medical Informatics and Decision Making
Abstract:This study aimed to develop a higher performance nomogram based on explainable machine learning methods, and to predict the risk of death of stroke patients within 30 days based on clinical characteristics on the first day of intensive care units (ICU) admission.
medical informatics
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This study aims to develop a high-performance nomogram based on interpretable machine learning methods to predict the 30-day mortality risk of stroke patients based on clinical features from the first day of ICU admission. ### Background Stroke is the second leading cause of death globally, and stroke patients in the ICU have a high mortality rate and other adverse functional outcomes. Monitoring information in the ICU is crucial for improving patient treatment and prognosis. Traditional nomograms are usually based on logistic regression and Cox proportional hazards models with linear assumptions, but these models cannot handle the nonlinear relationships in clinical practice. In contrast, machine learning models can handle nonlinear relationships in the real world and show higher accuracy. However, the "black box" nature of machine learning limits its clinical application. Therefore, this study aims to combine the advantages of nomograms and machine learning to develop a higher-performance and easy-to-use clinical prediction nomogram. ### Methods 1. **Data Source**: Extract data of stroke patients from the MIMIC-IV and MIMIC-III databases. 2. **Machine Learning Model**: Use the LightGBM machine learning method and Shapley Additive Explanations (SHAP) to select clinical features and define cut-off points. 3. **Variable Evaluation**: Use Cox proportional hazards regression models and Kaplan-Meier survival curves to evaluate the selected features and cut-off points. 4. **Nomogram Construction**: Construct nomograms based on logistic regression using both original and binary variables to predict the 30-day mortality risk of stroke patients. 5. **Performance Evaluation**: Evaluate the performance of the two nomograms at both the overall and individual levels. ### Results - A total of 2982 stroke patients were included, with a 30-day mortality rate of 23.6%. - The EML method identified 10 important variables and their cut-off points, including SOFA score, minimum blood glucose, maximum sodium, age, average blood oxygen saturation, maximum body temperature, maximum heart rate, minimum blood urea nitrogen, minimum white blood cell count, and Charlson Comorbidity Index. - In the Cox proportional hazards regression model and Kaplan-Meier survival curves, the 30-day mortality rate in the high-risk subgroup was significantly higher than in the low-risk subgroup. - The EML-based nomogram outperformed traditional nomograms at the overall level and significantly improved at the individual level, especially in patients with low "maximum body temperature." ### Conclusion - The 10 selected clinical features on the first day of ICU admission require special attention for stroke patients. - The nomogram based on interpretable machine learning will have greater advantages in clinical applications. ### Keywords - Stroke - Interpretable Machine Learning - Nomogram - Predictive Model - MIMIC Database This study improves the accuracy of predicting the 30-day mortality risk of stroke patients by combining the advantages of interpretable machine learning and nomograms, helping clinicians better assess and manage patients' short-term mortality risk.