Machine learning algorithms identify demographics, dietary features, and blood biomarkers associated with stroke records
Jundong Liu,Elizabeth L. Chou,Kui Kai Lau,Peter Y.M. Woo,Jun Li,Kei Hang Katie Chan
DOI: https://doi.org/10.1016/j.jns.2022.120335
IF: 4.4
2022-09-15
Journal of the Neurological Sciences
Abstract:ObjectiveWe conducted a comprehensive evaluation of features associated with stroke records.MethodsWe screened the dietary nutrients, blood biomarkers, and clinical information from the National Health and Nutrition Examination Survey (NHANES) 2015–16 database to assess a self-reported history of all strokes (136 strokes, n = 4381). We computed feature importance, built machine learning (ML) models, developed a nomogram, and validated the nomogram on NHANES 2007–08, 2017–18, and the baseline UK Biobank. We calculated the odds ratios with/without adjusting sampling weights (OR/ORw).ResultsThe clinical features have the best predictive power compared to dietary nutrients and blood biomarkers, with 22.8% increased average area under the receiver operating characteristic curves (AUROC) in ML models. We further modeled with ten most important clinical features without compromising the predictive performance. The key features positively associated with stroke include age, cigarette smoking, tobacco smoking, Caucasian or African American race, hypertension, diabetes mellitus, asthma history; the negatively associated feature is the family income. The nomogram based on these key features achieved good performances (AUROC between 0.753 and 0.822) on the test set, the NHANES 2007–08, 2017–18, and the UK Biobank. Key features from the nomogram model include age (OR = 1.05, ORw = 1.06), Caucasian/African American (OR = 2.68, ORw = 2.67), diabetes mellitus (OR = 2.30, ORw = 1.99), asthma (OR = 2.10, ORw = 2.41), hypertension (OR = 1.86, ORw = 2.10), and income (OR = 0.83, ORw = 0.81).ConclusionsWe identified clinical key features and built predictive models for assessing stroke records with high performance. A nomogram consisting of questionnaire-based variables would help identify stroke survivors and evaluate the potential risk of stroke.
neurosciences,clinical neurology