Abstract:Alzheimer’s disease (AD) and AD-related dementias (ADRD) are a class of neurodegenerative diseases affecting about 5.7 million Americans. There is no cure for AD/ADRD. Current interventions have modest effects and focus on attenuating cognitive impairment. Detection of patients at high risk of AD/ADRD is crucial for timely interventions to modify risk factors and primarily prevent cognitive decline and dementia, and thus to enhance the quality of life and reduce health care costs. This study seeks to investigate both knowledge-driven (where domain experts identify useful features) and data-driven (where machine learning models select useful features among all available data elements) approaches for AD/ADRD early prediction using real-world electronic health records (EHR) data from the University of Florida (UF) Health system. We identified a cohort of 59,799 patients and examined four widely used machine learning algorithms following a standard case-control study. We also examined the early prediction of AD/ADRD using patient information 0-years, 1-year, 3-years, and 5-years before the disease onset date. The experimental results showed that models based on the Gradient Boosting Trees (GBT) achieved the best performance for the data-driven approach and the Random Forests (RF) achieved the best performance for the knowledge-driven approach. Among all models, GBT using a data-driven approach achieved the best area under the curve (AUC) score of 0.7976, 0.7192, 0.6985, and 0.6798 for 0, 1, 3, 5-years prediction, respectively. We also examined the top features identified by the machine learning models and compared them with the knowledge-driven features identified by domain experts. Our study demonstrated the feasibility of using electronic health records for the early prediction of AD/ADRD and discovered potential challenges for future investigations. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This study was partially supported by an Ed and Ethel Moore Alzheimer's Disease Research Program from the Florida Department of Health (FL DOH #9AZ14) and a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-2018C3-14754). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The University of Florida IRB (IRB201900182) had approved this study and assigned to the exempt category. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes Data sharing is not applicable to this article.

Explainable machine learning aggregates polygenic risk scores and electronic health records for Alzheimer’s disease prediction

Alzheimer’s disease risk prediction using automated machine learning

Explainable Risk Prediction of Post-Stroke Adverse Mental Outcomes Using Machine Learning Techniques in a Population of 1780 Patients

Early Prediction of Alzheimer's Disease and Related Dementias Using Real-World Electronic Health Records.

Data-driven discovery of probable Alzheimer's disease and related dementia subphenotypes using electronic health records

Using Machine Learning and Electronic Health Record (EHR) Data for the Early Prediction of Alzheimer's Disease and Related Dementias.

An explainable machine learning based prediction model for Alzheimer's disease in China longitudinal aging study

Assessing polyomic risk to predict Alzheimer's disease using a machine learning model

Prediction of clinical diagnosis of Alzheimer’s disease, vascular, mixed, and all-cause dementia by a polygenic risk score and APOE status in a community-based cohort prospectively followed over 17 years

Development of a Novel Dementia Risk Prediction Model in the General Population: A Large, Longitudinal, Population-Based Machine-Learning Study

Early Prediction of Alzheimer’s Disease and Related Dementias Using Electronic Health Records

Deep Learning-Based Polygenic Risk Analysis for Alzheimer’s Disease Prediction

Deep learning-based polygenic risk analysis for Alzheimer's disease prediction

An explainable machine learning approach for Alzheimer's disease classification

Identification of Outcome-Oriented Progression Subtypes from Mild Cognitive Impairment to Alzheimer’s Disease Using Electronic Health Records

Improving Cardiovascular Risk Prediction Through Machine Learning Modelling of Irregularly Repeated Electronic Health Records

Explainable machine learning for predicting conversion to neurological disease: Results from 52,939 medical records

Machine Learning-Driven Prediction of Brain Age for Alzheimer's Risk: APOE4 Genotype and Gender Effects

Predicting Alzheimer's disease from cognitive footprints in mid and late life: How much can register data and machine learning help?

Predicting Alzheimer’s Disease from Cognitive Footprints in Mid and Late Life: How Much Can Register Data and Machine Learning Help?

Deep learning methods improve polygenic risk analysis and prediction for Alzheimer’s disease