Automated Medical Records Review for Mild Cognitive Impairment and Dementia
Ruoqi Wei,Stephanie S Buss,Rebecca Milde,Marta Fernandes,Daniel Sumsion,Elijah Davis,Wan-Yee Kong,Yiwen Xiong,Jet Veltink,Samvrit Rao,Tara M Westover,Lydia Petersen,Niels Turley,Arjun Singh,Sudeshna Das,Valdery Moura Junior,Manohar Ghanta,Aditya Gupta,Jennifer Kim,Alice D Lam,Katie L Stone,Emmanuel Mignot,Dennis Hwang,Lynn Marie Trotti,Gari D Clifford,Umakanth Katwa,Robert J Thomas,Shibani Mukerji,Sahar F Zafar,M Brandon Westover,Haoqi Sun
DOI: https://doi.org/10.21203/rs.3.rs-5046441/v1
2024-11-06
Abstract:Objectives: Unstructured and structured data in electronic health records (EHR) are a rich source of information for research and quality improvement studies. However, extracting accurate information from EHR is labor-intensive. Here we introduce an automated EHR phenotyping model to identify patients with Alzheimer's Disease, related dementias (ADRD), or mild cognitive impairment (MCI). Methods: We assembled medical notes and associated International Classification of Diseases (ICD) codes and medication prescriptions from 3,626 outpatient adults from two hospitals seen between February 2015 and June 2022. Ground truth annotations regarding the presence vs. absence of a diagnosis of MCI or ADRD were determined through manual chart review. Indicators extracted from notes included the presence of keywords and phrases in unstructured clinical notes, prescriptions of medications associated with MCI/ADRD, and ICD codes associated with MCI/ADRD. We trained a regularized logistic regression model to predict the ground truth annotations. Model performance was evaluated using area under the receiver operating curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, specificity, precision/positive predictive value, recall/sensitivity, and F1 score (harmonic mean of precision and recall). Results: Thirty percent of patients in the cohort carried diagnoses of MCI/ADRD based on manual review. When evaluated on a held-out test set, the best model using clinical notes, ICDs, and medications, achieved an AUROC of 0.98, an AUPRC of 0.98, an accuracy of 0.93, a sensitivity (recall) of 0.91, a specificity of 0.96, a precision of 0.96, and an F1 score of 0.93 The estimated overall accuracy for patients randomly selected from EHRs was 99.88%. Conclusion: Automated EHR phenotyping accurately identifies patients with MCI/ADRD based on clinical notes, ICD codes, and medication records. This approach holds potential for large-scale MCI/ADRD research utilizing EHR databases.