Can longitudinal electronic health record data identify patients at higher risk of developing long COVID?

Priya Shanmugam,Molly Bair,Emma Pendl-Robinson,Xindi C. Hu
DOI: https://doi.org/10.1101/2024.02.08.24302528
2024-02-09
Abstract:With hundreds of millions of COVID-19 infections to date, a considerable portion of the population has developed or will develop long COVID. Understanding the prevalence, risk factors, and healthcare costs of long COVID can be of significant societal importance. To investigate the utility of large-scale electronic health record (EHR) data in identifying and predicting long COVID, we analyzed data from the National COVID Cohort Collaborative (N3C), a longitudinal EHR data repository from 65 sites in the US with over 8 million COVID-19 patients. We characterized the prevalence of long COVID using a few different types of definition to illustrate their relative strengths and weaknesses. Then we developed a machine learning model to predict the risk of developing long COVID using demographic factors and comorbidity in the EHR. The risk factors for long COVID include patient age; sex; smoking status; and comorbidities characterized by the Charlson Comorbidity Index.
Health Informatics
What problem does this paper attempt to address?