Alzheimer’s disease risk prediction using automated machine learning

Xiaoyi Raymond Gao,Yi‐Ju Li,Eden R. Martin
DOI: https://doi.org/10.1002/alz.053953
2021-12-01
Abstract:Alzheimer’s disease (AD) is the most common late‐onset neurodegenerative disease. About 5.4 million Americans are living with AD. Unfortunately, there is no cure for AD at present, which makes early prediction crucial. Identifying individuals at increased risk of AD provides a better chance of benefiting from treatments. Risk prediction models are typically based on a limited number of predictors possibly with sub‐optimal performance. Here, we explore a state‐of‐the‐art automated machine learning (AutoML) framework for AD risk prediction, which can handle hundreds of predictors, including non‐traditional variables, with automatic feature engineering and model selection. We developed an AutoML model that aggregates polygenic risk scores (PRSs) and baseline individual characteristics (e.g., non‐genetic factors) for predicting AD. The PRSs were derived using summary statistics of the genome‐wide association studies from the Alzheimer Disease Genetics Consortium (ADGC) dataset (n = 19,918). The model was applied to 455,233 participants in UKBB without AD at baseline to predict development of AD at the final observation (n=1,452 developed AD). Our model was based on the H2O AutoML, an intelligent algorithm that can automatically select hyperparameters, tune ensembles of ML models, and carry out model assessment. The area under the receiver operating characteristic curve (AUC) for AD risk prediction was over 0.86. Polygenic risk scores ranked only second to age in feature importance. Furthermore, our AutoML model identified predictors that are not typically considered in traditional prediction models, such as an individual’s overall health rating and usual walking pace. Our AutoML model improves the accuracy of AD risk prediction by efficiently exploring numerous predictors and ensemble models while greatly reducing manual coding hours. Furthermore, AutoML uncovered novel predictors for AD.
What problem does this paper attempt to address?