Abstract:Alzheimer’s disease (AD) and AD-related dementias (ADRD) are a class of neurodegenerative diseases affecting about 5.7 million Americans. There is no cure for AD/ADRD. Current interventions have modest effects and focus on attenuating cognitive impairment. Detection of patients at high risk of AD/ADRD is crucial for timely interventions to modify risk factors and primarily prevent cognitive decline and dementia, and thus to enhance the quality of life and reduce health care costs. This study seeks to investigate both knowledge-driven (where domain experts identify useful features) and data-driven (where machine learning models select useful features among all available data elements) approaches for AD/ADRD early prediction using real-world electronic health records (EHR) data from the University of Florida (UF) Health system. We identified a cohort of 59,799 patients and examined four widely used machine learning algorithms following a standard case-control study. We also examined the early prediction of AD/ADRD using patient information 0-years, 1-year, 3-years, and 5-years before the disease onset date. The experimental results showed that models based on the Gradient Boosting Trees (GBT) achieved the best performance for the data-driven approach and the Random Forests (RF) achieved the best performance for the knowledge-driven approach. Among all models, GBT using a data-driven approach achieved the best area under the curve (AUC) score of 0.7976, 0.7192, 0.6985, and 0.6798 for 0, 1, 3, 5-years prediction, respectively. We also examined the top features identified by the machine learning models and compared them with the knowledge-driven features identified by domain experts. Our study demonstrated the feasibility of using electronic health records for the early prediction of AD/ADRD and discovered potential challenges for future investigations. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This study was partially supported by an Ed and Ethel Moore Alzheimer's Disease Research Program from the Florida Department of Health (FL DOH #9AZ14) and a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-2018C3-14754). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The University of Florida IRB (IRB201900182) had approved this study and assigned to the exempt category. All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes Data sharing is not applicable to this article.

Assess the Documentation of Cognitive Tests and Biomarkers in Electronic Health Records via Natural Language Processing for Alzheimer's Disease and Related Dementias

Assess the Documentation of Cognitive Tests and Biomarkers in Electronic Health Records Via Natural Language Processing for Alzheimer’s Disease and Related Dementias

Digitized Biomarkers Utilizing Human‐Computer Interaction Sensing Technology for Early Auxiliary Diagnosis of Alzheimer’s Disease

Identification of Outcome-Oriented Progression Subtypes from Mild Cognitive Impairment to Alzheimer’s Disease Using Electronic Health Records

Develop and validate a computable phenotype for the identification of Alzheimer's disease patients using electronic health record data

Develop and Validate a Computable Phenotype for the Identification of Alzheimer’s Disease Patients Using Electronic Health Record Data

Feasibility of Identifying Factors Related to Alzheimer’s Disease and Related Dementia in Real-World Data

Feasibility of Identifying Factors Related to Alzheimer's Disease and Related Dementia in Real-World Data

Automated Medical Records Review for Mild Cognitive Impairment and Dementia

Early Prediction of Alzheimer’s Disease and Related Dementias Using Electronic Health Records

Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study

Cognitive Biomarker Prioritization in Alzheimer's Disease using Brain Morphometric Data

Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing

Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study

Data-driven discovery of probable Alzheimer's disease and related dementia subphenotypes using electronic health records

Early Prediction of Alzheimer's Disease and Related Dementias Using Real-World Electronic Health Records.

Cognitive biomarker prioritization in Alzheimer’s Disease using brain morphometric data

Detecting Alzheimer's Disease Using Natural Language Processing of Referential Communication Task Transcripts

Early Prediction of Alzheimers Disease Leveraging Symptom Occurrences from Longitudinal Electronic Health Records of US Military Veterans

Extraction of Sleep Information from Clinical Notes of Patients with Alzheimer's Disease Using Natural Language Processing

Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods