Machine learning for clinical outcome prediction in cerebrovascular and endovascular neurosurgery: systematic review and meta-analysis

Haydn Hoffman,Jason J Sims,Violiza Inoa-Acosta,Daniel Hoit,Adam S Arthur,Dan Y Draytsel,YeonSoo Kim,Nitin Goyal
DOI: https://doi.org/10.1136/jnis-2024-021759
2024-05-15
Abstract:Background: Machine learning (ML) may be superior to traditional methods for clinical outcome prediction. We sought to systematically review the literature on ML for clinical outcome prediction in cerebrovascular and endovascular neurosurgery. Methods: A comprehensive literature search was performed, and original studies of patients undergoing cerebrovascular surgeries or endovascular procedures that developed a supervised ML model to predict a postoperative outcome or complication were included. Results: A total of 60 studies predicting 71 outcomes were included. Most cohorts were derived from single institutions (66.7%). The studies included stroke (32), subarachnoid hemorrhage ((SAH) 16), unruptured aneurysm (7), arteriovenous malformation (4), and cavernous malformation (1). Random forest was the best performing model in 12 studies (20%) followed by XGBoost (13.3%). Among 42 studies in which the ML model was compared with a standard statistical model, ML was superior in 33 (78.6%). Of 10 studies in which the ML model was compared with a non-ML clinical prediction model, ML was superior in nine (90%). External validation was performed in 10 studies (16.7%). In studies predicting functional outcome after mechanical thrombectomy the pooled area under the receiver operator characteristics curve (AUROC) of the test set performances was 0.84 (95% CI 0.79 to 0.88). For studies predicting outcomes after SAH, the pooled AUROCs for functional outcomes and delayed cerebral ischemia were 0.89 (95% CI 0.76 to 0.95) and 0.90 (95% CI 0.66 to 0.98), respectively. Conclusion: ML performs favorably for clinical outcome prediction in cerebrovascular and endovascular neurosurgery. However, multicenter studies with external validation are needed to ensure the generalizability of these findings.
What problem does this paper attempt to address?