Enhancing patient representation learning from electronic health records through predicted family relations

Xiayuan Huang,Jatin Arora,Abdullah Mesut Erzurumluoglu,Daniel Lam,Boehringer Ingelheim – Global Computational Biology and Digital Sciences,Hongyu Zhao,Zhihao Ding,Zuoheng Wang,Johann de Jong
DOI: https://doi.org/10.1101/2024.03.12.24304163
2024-03-13
Abstract:Artificial intelligence and machine learning are powerful tools in analyzing electronic health records (EHRs) for healthcare research. Despite the recognized importance of family health history, in healthcare research individual patients are often treated as independent samples, overlooking family relations. To address this gap, we present ALIGATEHR, which models predicted family relations in a graph attention network and integrates this information with a medical ontology representation. Taking disease risk prediction as a use case, we demonstrate that explicitly modeling family relations significantly improves predictions across the disease spectrum. We then show how ALIGATEHR’s attention mechanism, which links patients’ disease risk to their relatives’ clinical profiles, successfully captures genetic aspects of diseases using only EHR diagnosis data. Finally, we use ALIGATHER to successfully distinguish the two main inflammatory bowel disease subtypes (Crohn’s disease and ulcerative colitis), illustrating its great potential for improving patient representation learning for predictive and descriptive modeling of EHRs.
Health Informatics
What problem does this paper attempt to address?