Interoperability of phenome-wide multimorbidity patterns: a comparative study of two large-scale EHR systems

Nick Strayer,Tess J. Vessels,Karmel W Choi,Siwei Zhang,Yajing Li,Lide Han,Brian Sharber,Ryan S Hsi,Cosmin A Bejan,Alexander G Bick,Justin M Balko,Douglas B Johnson,Lee E Wheless,Quinn S Wells,Elizabeth J Phillips,Wesley H. Self,Jill M Pulley,Consuelo H Wilkins,Qingxia Chen,Tina Hartert,Michael R Savona,Yu Shyr,Dan M Roden,Jordan W Smoller,Douglas M Ruderfer,Yaomin Xu
DOI: https://doi.org/10.1101/2024.03.28.24305045
2024-05-27
Abstract:Abstract Background: Electronic health records (EHR) are increasingly used for studying multimorbidities. However, concerns about accuracy, completeness, and EHRs being primarily designed for billing and administrative purposes raise questions about the consistency and reproducibility of EHR-based multimorbidity research. Methods: Utilizing phecodes to represent the disease phenome, we analyzed pairwise comorbidity strengths using a dual logistic regression approach and constructed multimorbidity as an undirected weighted graph. We assessed the consistency of the multimorbidity networks within and between two major EHR systems at local (nodes and edges), meso (neighboring patterns), and global (network statistics) scales. We present case studies to identify disease clusters and uncover clinically interpretable disease relationships. We provide an interactive web tool and a knowledge base combining data from multiple sources for online multimorbidity analysis. Findings: Analyzing data from 500,000 patients across Vanderbilt University Medical Center and Mass General Brigham health systems, we observed a strong correlation in disease frequencies ( Kendalls tau = 0.643) and comorbidity strengths (Pearson rho = 0.79). Consistent network statistics across EHRs suggest similar structures of multimorbidity networks at various scales. Comorbidity strengths and similarities of multimorbidity connection patterns align with the disease genetic correlations. Graph-theoretic analyses revealed a consistent core-periphery structure, implying efficient network clustering through threshold graph construction. Using hydronephrosis as a case study, we demonstrated the networks ability to uncover clinically relevant disease relationships and provide novel insights. Interpretation: Our findings demonstrate the robustness of large-scale EHR data for studying phenome-wide multimorbidities. The alignment of multimorbidity patterns with genetic data suggests the potential utility for uncovering shared biology of diseases. The consistent core-periphery structure offers analytical insights to discover complex disease interactions. This work also sets the stage for advanced disease modeling, with implications for precision medicine. Funding: VUMC Biostatistics Development Award, the National Institutes of Health, and the VA CSRD
What problem does this paper attempt to address?