Prediction of subjective cognitive decline after corpus callosum infarction by an interpretable machine learning-derived early warning strategy
Yawen Xu,Xu Sun,Yanqun Liu,Yuxin Huang,Meng Liang,Rui Sun,Ge Yin,Chenrui Song,Qichao Ding,Bingying Du,Xiaoying Bi
DOI: https://doi.org/10.3389/fneur.2023.1123607
IF: 3.4
2023-06-09
Frontiers in Neurology
Abstract:Background and purpose: Corpus callosum (CC) infarction is an extremely rare subtype of cerebral ischemic stroke, however, the symptoms of cognitive impairment often fail to attract early attention of patients, which seriously affects the long-term prognosis, such as high mortality, personality changes, mood disorders, psychotic reactions, financial burden and so on. This study seeks to develop and validate models for early predicting the risk of subjective cognitive decline (SCD) after CC infarction by machine learning (ML) algorithms. Methods: This is a prospective study that enrolled 213 (only 3.7%) CC infarction patients from a nine-year cohort comprising 8,555 patients with acute ischemic stroke. Telephone follow-up surveys were carried out for the patients with definite diagnosis of CC infarction one-year after disease onset, and SCD was identified by Behavioral Risk Factor Surveillance System (BRFSS) questionnaire. Based on the significant features selected by the least absolute shrinkage and selection operator (LASSO), seven ML models including Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), Gaussian Naïve Bayes (GNB), Complement Naïve Bayes (CNB), and Support vector machine (SVM) were established and their predictive performances were compared by different metrics. Importantly, the SHapley Additive exPlanations (SHAP) was also utilized to examine internal behavior of the highest-performance ML classifier. Results: The Logistic Regression (LR)-model performed better than other six ML-models in SCD predictability after the CC infarction, with the area under the receiver characteristic operator curve (AUC) of 77.1% in the validation set. Using LASSO and SHAP analysis, we found that infarction subregions of CC infarction, female, 3-month modified Rankin Scale (mRS) score, age, homocysteine, location of angiostenosis, neutrophil to lymphocyte ratio, pure CC infarction, and number of angiostenosis were the top-nine significant predictors in the order of importance for the output of LR-model. Meanwhile, we identified that infarction subregion of CC, female, 3-month mRS score and pure CC infarction were the factors which independently associated with the cognitive outcome. Conclusion: Our study firstly demonstrated that the LR-model with 9 common variables has the best-performance to predict the risk of post-stroke SCD due to CC infarcton. Particularly, the combination of LR-model and SHAP-explainer could aid in achieving personalized risk prediction and be served as a decision-making tool for early intervention since its poor long-term outcome.
neurosciences,clinical neurology