Interpretable machine learning for early neurological deterioration prediction in atrial fibrillation-related stroke
Seong-Hwan Kim,Eun-Tae Jeon,Sungwook Yu,Kyungmi Oh,Chi Kyung Kim,Tae-Jin Song,Yong-Jae Kim,Sung Hyuk Heo,Kwang-Yeol Park,Jeong-Min Kim,Jong-Ho Park,Jay Chol Choi,Man-Seok Park,Joon-Tae Kim,Kang-Ho Choi,Yang Ha Hwang,Bum Joon Kim,Jong-Won Chung,Oh Young Bang,Gyeongmoon Kim,Woo-Keun Seo,Jin-Man Jung
DOI: https://doi.org/10.1038/s41598-021-99920-7
IF: 4.6
2021-10-18
Scientific Reports
Abstract:Abstract We aimed to develop a novel prediction model for early neurological deterioration (END) based on an interpretable machine learning (ML) algorithm for atrial fibrillation (AF)-related stroke and to evaluate the prediction accuracy and feature importance of ML models. Data from multicenter prospective stroke registries in South Korea were collected. After stepwise data preprocessing, we utilized logistic regression, support vector machine, extreme gradient boosting, light gradient boosting machine (LightGBM), and multilayer perceptron models. We used the Shapley additive explanation (SHAP) method to evaluate feature importance. Of the 3,213 stroke patients, the 2,363 who had arrived at the hospital within 24 h of symptom onset and had available information regarding END were included. Of these, 318 (13.5%) had END. The LightGBM model showed the highest area under the receiver operating characteristic curve (0.772; 95% confidence interval, 0.715–0.829). The feature importance analysis revealed that fasting glucose level and the National Institute of Health Stroke Scale score were the most influential factors. Among ML algorithms, the LightGBM model was particularly useful for predicting END, as it revealed new and diverse predictors. Additionally, the effects of the features on the predictive power of the model were individualized using the SHAP method.
multidisciplinary sciences