Development of a Novel Dementia Risk Prediction Model in the General Population: A Large, Longitudinal, Population-Based Machine-Learning Study

Jia You,Ya-Ru Zhang,Hui-Fu Wang,Ming Yang,Feng,Jin-Tai Yu,Wei Cheng
DOI: https://doi.org/10.1016/j.eclinm.2022.101665
IF: 15.1
2022-01-01
EClinicalMedicine
Abstract:Background The existing dementia risk models are limited to known risk factors and traditional statistical methods. We aimed to employ machine learning (ML) to develop a novel dementia prediction model by leveraging a rich-phenotypic variable space of 366 features covering multiple domains of health-related data. Methods In this longitudinal population-based cohort of the UK Biobank (UKB), 425,159 non-demented participants were enrolled from 22 recruitment centres across the UK between March 1, 2006 and October 31, 2010. We implemented a data-driven strategy to identify predictors from 366 candidate variables covering a comprehensive range of genetic and environmental factors and developed the ML model to predict incident dementia and Alzheimer's Disease (AD) within five, ten, and much longer years (median 11.9 [Interquartile range 11.2-12.5] years). Findings During a follow-up of 5,023,337 person-years, 5287 and 2416 participants developed dementia and AD, respectively. A novel UKB dementia risk prediction (UKB-DRP) model comprising ten predictors including age, ApoE e4, pairs matching time, leg fat percentage, number of medications taken, reaction time, peak expiratory flow, mother's age at death, long-standing illness, and mean corpuscular volume was established. Our prediction model was internally evaluated based on five-fold cross-validation on discrimination and calibration, and it was further compared with existing prediction scales. The UKB-DRP model can achieve high discriminative accuracy in dementia (AUC 0.848 +/- 0.007) and even better in AD (AUC 0.862 +/- 0.015). The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.92), and the predictive power was solid in different incidence time groups. More importantly, our model presented an apparent superiority over existing models like Cardiovascular Risk Factors, Aging, and Incidence of Dementia Risk Score (AUC 0.705 +/- 0.008), the Dementia Risk Score (AUC 0.752 +/- 0.007), and the Australian National University Alzheimer's Disease Risk Index (AUC 0.584 +/- 0.017). The model was internally validated in the general population of European ancestry and White ethnicity; thus, further validation with independent datasets is necessary to confirm these findings. Interpretation Our ML-based UKB-DRP model incorporated ten easily accessible predictors with solid predictive power for incident dementia and AD within five, ten, and much longer years, which can be used to identify individuals at high risk of dementia and AD in the general population. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd.
What problem does this paper attempt to address?