Risk Factors for Perinatal Arterial Ischemic Stroke: A Machine Learning Approach
Ratika Srivastava,Lauran Cole,Kimberly Amador,Nils Daniel Forkert,Mary Dunbar,Michael I Shevell,Maryam Oskoui,Anna P Basu,Michael J Rivkin,Eilon Shany,Linda S de Vries,Deborah Dewey,Nicole Letourneau,Pauline Mouches,Michael D Hill,Adam Kirton
DOI: https://doi.org/10.1212/WNL.0000000000209393
IF: 9.9
Neurology
Abstract:Background and objectives: Perinatal arterial ischemic stroke (PAIS) is a focal vascular brain injury presumed to occur between the fetal period and the first 28 days of life. It is the leading cause of hemiparetic cerebral palsy. Multiple maternal, intrapartum, delivery, and fetal factors have been associated with PAIS, but studies are limited by modest sample sizes and complex interactions between factors. Machine learning approaches use large and complex data sets to enable unbiased identification of clinical predictors but have not yet been applied to PAIS. We combined large PAIS data sets and used machine learning methods to identify clinical PAIS factors and compare this data-driven approach with previously described literature-driven clinical prediction models. Methods: Common data elements from 3 registries with patients with PAIS, the Alberta Perinatal Stroke Project, Canadian Cerebral Palsy Registry, International Pediatric Stroke Study, and a longitudinal cohort of healthy controls (Alberta Pregnancy Outcomes and Nutrition Study), were used to identify potential predictors of PAIS. Inclusion criteria were term birth and idiopathic PAIS (absence of primary causative medical condition). Data including maternal/pregnancy, intrapartum, and neonatal factors were collected between January 2003 and March 2020. Common data elements were entered into a validated random forest machine learning pipeline to identify the highest predictive features and develop a predictive model. Univariable analyses were completed post hoc to assess the relationship between each predictor and outcome. Results: A machine learning model was developed using data from 2,571 neonates, including 527 cases (20%) and 2,044 controls (80%). With a mean of 21 features selected, the random forest machine learning approach predicted the outcome with approximately 86.5% balanced accuracy. Factors that were selected a priori through literature-driven variable selection that were also identified as most important by the machine learning model were maternal age, recreational substance exposure, tobacco exposure, intrapartum maternal fever, and low Apgar score at 5 minutes. Additional variables identified through machine learning included in utero alcohol exposure, infertility, miscarriage, primigravida, meconium, spontaneous vaginal delivery, neonatal head circumference, and 1-minute Apgar score. Overall, the machine learning model performed better (area under the curve [AUC] 0.93) than the literature-driven model (AUC 0.73). Discussion: Machine learning may be an alternative, unbiased method to identify clinical predictors associated with PAIS. Identification of previously suggested and novel clinical factors requires cautious interpretation but supports the multifactorial nature of PAIS pathophysiology. Our results suggest that identification of neonates at risk of PAIS is possible.