Abstract:Background: The prognosis, recurrence rates, and secondary prevention strategies varied significantly among different subtypes of acute ischemic stroke (AIS). Machine learning (ML) techniques can uncover intricate, non-linear relationships within medical data, enabling the identification of factors associated with etiological classification. However, there is currently a lack of research utilizing ML algorithms for predicting AIS etiology. Objective: We aimed to use interpretable ML algorithms to develop AIS etiology prediction models, identify critical factors in etiology classification, and enhance existing clinical categorization. Methods: This study involved patients with the Third China National Stroke Registry (CNSR-III). Nine models, which included Natural Gradient Boosting (NGBoost), Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Light Gradient Boosting Machine (LGBM), Gradient Boosting Decision Tree (GBDT), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), and logistic regression (LR), were employed to predict large artery atherosclerosis (LAA), small vessel occlusion (SVO), and cardioembolism (CE) using an 80:20 randomly split training and test set. We designed an SFS-XGB with 10-fold cross-validation for feature selection. The primary evaluation metrics for the models included the area under the receiver operating characteristic curve (AUC) for discrimination and the Brier score (or calibration plots) for calibration. Results: A total of 5,213 patients were included, comprising 2,471 (47.4%) with LAA, 2,153 (41.3%) with SVO, and 589 (11.3%) with CE. In both LAA and SVO models, the AUC values of the ML models were significantly higher than that of the LR model (P < 0.001). The optimal model for predicting SVO (AUC [RF model] = 0.932) outperformed the optimal LAA model (AUC [NGB model] = 0.917) and the optimal CE model (AUC [LGBM model] = 0.846). Each model displayed relatively satisfactory calibration. Further analysis showed that the optimal CE model could identify potential CE patients in the undetermined etiology (SUE) group, accounting for 1,900 out of 4,156 (45.7%). Conclusions: The ML algorithm effectively classified patients with LAA, SVO, and CE, demonstrating superior classification performance compared to the LR model. The optimal ML model can identify potential CE patients among SUE patients. These newly identified predictive factors may complement the existing etiological classification system, enabling clinicians to promptly categorize stroke patients' etiology and initiate optimal strategies for secondary prevention.

Stroke Risk Prediction Using Machine Learning: a Prospective Cohort Study of 0.5 Million Chinese Adults

Op-jami210070 1719..1727

Prediction-Driven Decision Support for Patients With Mild Stroke: A Model Based on Machine Learning Algorithms

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults

Building and validating a predictive model for stroke risk in Chinese community-dwelling patients with chronic obstructive pulmonary disease using machine learning methods

Ischemic stroke prediction using machine learning in elderly Chinese population: The Rugao Longitudinal Ageing Study

Improving Stroke Risk Prediction in the General Population: A Comparative Assessment of Common Clinical Rules, a New Multimorbid Index, and Machine-Learning-Based Algorithms

Causative Classification of Ischemic Stroke by the Machine Learning Algorithm Random Forests

Development, Validation and Comparison of Multivariable Risk Scores for Prediction of Total Stroke and Stroke Types in Chinese Adults: a Prospective Study of 0.5 Million Adults

Using machine learning models to improve stroke risk level classification methods of China national stroke screening

Utility of single versus sequential measurements of risk factors for prediction of stroke in Chinese adults

Comparing the performance of machine learning and conventional models for predicting atherosclerotic cardiovascular disease in a general Chinese population

Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study

Analyzing and predicting the risk of death in stroke patients using machine learning

Development of machine learning-based models to predict 10-year risk of cardiovascular disease: a prospective cohort study

Machine Learning Outperforms Traditional Logistic Regression and Offers New Possibilities for Cardiovascular Risk Prediction: A Study Involving 143,043 Chinese Patients with Hypertension

Development and internal validation of a multivariable prediction model for 6-year risk of stroke: a cohort study in middle-aged and elderly Chinese population

Development of Stroke Predictive Model in Community-Dwelling Population: A Longitudinal Cohort Study in Southeast China

Development and Validation of a 2-Year New-Onset Stroke Risk Prediction Model for People over Age 45 in China

Utilizing machine learning algorithms for the prediction of carotid artery plaques in a Chinese population