Abstract:Alzheimer’s disease (AD) and AD-related dementias (ADRD) are a class of neurodegenerative diseases affecting about 5.7 million Americans. There is no cure for AD/ADRD. Current interventions have modest effects and focus on attenuating cognitive impairment. Detection of patients at high risk of AD/ADRD is crucial for timely interventions to modify risk factors and primarily prevent cognitive decline and dementia, and thus to enhance the quality of life and reduce health care costs. This study seeks to investigate both knowledge-driven (where domain experts identify useful features) and data-driven (where machine learning models select useful features among all available data elements) approaches for AD/ADRD early prediction using real-world electronic health records (EHR) data from the University of Florida (UF) Health system. We identified a cohort of 59,799 patients and examined four widely used machine learning algorithms following a standard case-control study. We also examined the early prediction of AD/ADRD using patient information 0-years, 1-year, 3-years, and 5-years before the disease onset date. The experimental results showed that models based on the Gradient Boosting Trees (GBT) achieved the best performance for the data-driven approach and the Random Forests (RF) achieved the best performance for the knowledge-driven approach. Among all models, GBT using a data-driven approach achieved the best area under the curve (AUC) score of 0.7976, 0.7192, 0.6985, and 0.6798 for 0, 1, 3, 5-years prediction, respectively. We also examined the top features identified by the machine learning models and compared them with the knowledge-driven features identified by domain experts. Our study demonstrated the feasibility of using electronic health records for the early prediction of AD/ADRD and discovered potential challenges for future investigations. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This study was partially supported by an Ed and Ethel Moore Alzheimer's Disease Research Program from the Florida Department of Health (FL DOH #9AZ14) and a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-2018C3-14754). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The University of Florida IRB (IRB201900182) had approved this study and assigned to the exempt category. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes Data sharing is not applicable to this article.

Automated Medical Records Review for Mild Cognitive Impairment and Dementia

Identification of Outcome-Oriented Progression Subtypes from Mild Cognitive Impairment to Alzheimer’s Disease Using Electronic Health Records

Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data

Develop and Validate a Computable Phenotype for the Identification of Alzheimer’s Disease Patients Using Electronic Health Record Data

Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study

Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study

Data-driven discovery of probable Alzheimer's disease and related dementia subphenotypes using electronic health records

Rule-Based Identification of Individuals with Mild Cognitive Impairment or Alzheimer’s Disease Using Clinical Notes from the United States Veterans Affairs Healthcare System

Assess the Documentation of Cognitive Tests and Biomarkers in Electronic Health Records via Natural Language Processing for Alzheimer's Disease and Related Dementias

Using Machine Learning and Electronic Health Record (EHR) Data for the Early Prediction of Alzheimer's Disease and Related Dementias.

Assess the Documentation of Cognitive Tests and Biomarkers in Electronic Health Records Via Natural Language Processing for Alzheimer’s Disease and Related Dementias

Using Deep Learning to Identify Patients with Cognitive Impairment in Electronic Health Records

Predicting Risk of Alzheimer's Diseases and Related Dementias with AI Foundation Model on Electronic Health Records

Predicting Risk of Alzheimer’s Diseases and Related Dementias with AI Foundation Model on Electronic Health Records

Early Prediction of Alzheimer’s Disease and Related Dementias Using Electronic Health Records

Development and Validation of a Deep Learning Model for Earlier Detection of Cognitive Decline From Clinical Notes in Electronic Health Records

Early Prediction of Alzheimer's Disease and Related Dementias Using Real-World Electronic Health Records.

Prevalence of Mild Cognitive Impairment and Alzheimer's Disease Identified in Veterans in the United States

SCD-Tron: Leveraging Large Clinical Language Model for Early Detection of Cognitive Decline from Electronic Health Records

Fully Automated Discrimination of Alzheimer's Disease Using Resting-State Electroencephalography Signals.

Personalized screening and risk profiles for Mild Cognitive Impairment via a Machine Learning Framework: Implications for general practice