Abstract:Background: The prognosis, recurrence rates, and secondary prevention strategies varied significantly among different subtypes of acute ischemic stroke (AIS). Machine learning (ML) techniques can uncover intricate, non-linear relationships within medical data, enabling the identification of factors associated with etiological classification. However, there is currently a lack of research utilizing ML algorithms for predicting AIS etiology. Objective: We aimed to use interpretable ML algorithms to develop AIS etiology prediction models, identify critical factors in etiology classification, and enhance existing clinical categorization. Methods: This study involved patients with the Third China National Stroke Registry (CNSR-III). Nine models, which included Natural Gradient Boosting (NGBoost), Categorical Boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Light Gradient Boosting Machine (LGBM), Gradient Boosting Decision Tree (GBDT), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), and logistic regression (LR), were employed to predict large artery atherosclerosis (LAA), small vessel occlusion (SVO), and cardioembolism (CE) using an 80:20 randomly split training and test set. We designed an SFS-XGB with 10-fold cross-validation for feature selection. The primary evaluation metrics for the models included the area under the receiver operating characteristic curve (AUC) for discrimination and the Brier score (or calibration plots) for calibration. Results: A total of 5,213 patients were included, comprising 2,471 (47.4%) with LAA, 2,153 (41.3%) with SVO, and 589 (11.3%) with CE. In both LAA and SVO models, the AUC values of the ML models were significantly higher than that of the LR model (P < 0.001). The optimal model for predicting SVO (AUC [RF model] = 0.932) outperformed the optimal LAA model (AUC [NGB model] = 0.917) and the optimal CE model (AUC [LGBM model] = 0.846). Each model displayed relatively satisfactory calibration. Further analysis showed that the optimal CE model could identify potential CE patients in the undetermined etiology (SUE) group, accounting for 1,900 out of 4,156 (45.7%). Conclusions: The ML algorithm effectively classified patients with LAA, SVO, and CE, demonstrating superior classification performance compared to the LR model. The optimal ML model can identify potential CE patients among SUE patients. These newly identified predictive factors may complement the existing etiological classification system, enabling clinicians to promptly categorize stroke patients' etiology and initiate optimal strategies for secondary prevention.

Predictive etiological classification of acute ischemic stroke through interpretable machine learning algorithms: a multicenter, prospective cohort study

Causative Classification of Ischemic Stroke by the Machine Learning Algorithm Random Forests

Prediction-Driven Decision Support for Patients With Mild Stroke: A Model Based on Machine Learning Algorithms

Prediction of large vessel occlusion for ischaemic stroke by using the machine learning model random forests

Development and validation of a machine learning-based prognostic risk stratification model for acute ischemic stroke

Heterogeneity in the Diagnosis and Prognosis of Ischemic Stroke Subtypes: 9-Year Follow-Up of 22,000 Cases in Chinese Adults

Predicting the Outcome of Patients with Aneurysmal Subarachnoid Hemorrhage: A Machine-Learning-Guided Scorecard

Explainable machine learning for predicting neurological outcome in hemorrhagic and ischemic stroke patients in critical care

Interpretable machine learning for prediction of clinical outcomes in acute ischemic stroke

Development and validation of comprehensive clinical outcome prediction models for acute ischaemic stroke in anterior circulation based on machine learning

New strategy for clinical etiologic diagnosis of acute ischemic stroke and blood biomarker discovery based on machine learning

Machine Learning to Predict Long-Term Cardiac-Relative Prognosis in Patients With Extra-Cardiac Vascular Disease

Enriching the Study Population for Ischemic Stroke Therapeutic Trials Using a Machine Learning Algorithm

Predicting prognosis in patients with stroke treated with intravenous alteplase through blood pressure changes: A machine learning-based approach

Development and Validation of a Machine Learning-Based Model of Ischemic Stroke Risk in the Chinese Elderly Hypertensive Population

Small Vessel Disease Burden Predicts Functional Outcomes in Patients with Acute Ischemic Stroke Using Machine Learning.

Predicting ischemic stroke patients' prognosis changes using machine learning in a nationwide stroke registry

Interpretable Machine Learning Modeling for Ischemic Stroke Outcome Prediction

Predicting 3-month poor functional outcomes of acute ischemic stroke in young patients using machine learning

Predicting functional outcome in ischemic stroke patients using genetic, environmental, and clinical factors: a machine learning analysis of population-based prospective cohort study

Machine learning is an effective method to predict the 3-month prognosis of patients with acute ischemic stroke