Collaborative Graph Learning with Auxiliary Text for Temporal Event Prediction in Healthcare

Chang Lu,Chandan K. Reddy,Prithwish Chakraborty,Samantha Kleinberg,Yue Ning
DOI: https://doi.org/10.48550/arXiv.2105.07542
2021-05-17
Abstract:Accurate and explainable health event predictions are becoming crucial for healthcare providers to develop care plans for patients. The availability of electronic health records (EHR) has enabled machine learning advances in providing these predictions. However, many deep learning based methods are not satisfactory in solving several key challenges: 1) effectively utilizing disease domain knowledge; 2) collaboratively learning representations of patients and diseases; and 3) incorporating unstructured text. To address these issues, we propose a collaborative graph learning model to explore patient-disease interactions and medical domain knowledge. Our solution is able to capture structural features of both patients and diseases. The proposed model also utilizes unstructured text data by employing an attention regulation strategy and then integrates attentive text features into a sequential learning process. We conduct extensive experiments on two important healthcare problems to show the competitive prediction performance of the proposed method compared with various state-of-the-art models. We also confirm the effectiveness of learned representations and model interpretability by a set of ablation and case studies.
Machine Learning,Artificial Intelligence,Information Retrieval
What problem does this paper attempt to address?
This paper attempts to solve the following three key problems: 1. **Effectively utilize disease - domain knowledge**: - Many existing deep - learning methods fail to fully utilize the domain knowledge of diseases when processing Electronic Health Record (EHR) data. Specifically, these methods usually only consider the vertical relationship (i.e., hierarchical links) between diseases and their ancestor nodes, while ignoring the horizontal disease associations (such as complications). This ignorance leads to the neglect of potentially complex relationships between diseases, which in turn affects the prediction performance. 2. **Collaborative learning of patient - disease interactions**: - Existing methods often regard patients as independent samples and use diagnostic information to represent patients, but fail to capture the similarities between patients. For example, patients with the same diagnosis may have other similar diseases. This similarity is very important for predicting new - onset diseases from the records of other patients, but existing methods fail to effectively utilize this. 3. **Integrate unstructured text data**: - The unstructured text data in electronic health records (such as clinical notes) contains many valuable features, such as patients' physical signs and medical histories. However, most models fail to fully utilize this data, which not only leads to unsatisfactory prediction performance but also lacks interpretability. To solve these problems, the paper proposes a model based on Collaborative Graph Learning (CGL), which can: - Utilize the hierarchical structure knowledge in the medical field and represent diseases by constructing hierarchical embeddings. - Conduct collaborative graph learning on the observation graph and the ontology graph to learn the hidden features of patients and diseases. - Design a TF - IDF - modified attention mechanism to encode clinical notes and combine it with time - series learning, thereby improving the prediction performance and the interpretability of the model. Specifically, the CGL model solves the above problems in the following ways: - **Hierarchical embedding**: By recursively creating virtual sub - nodes, non - leaf nodes are filled with virtual leaf nodes to form a complete hierarchical structure. - **Collaborative graph learning**: Construct a patient - disease observation graph and a disease ontology graph, and learn the hidden features of patients and diseases through graph aggregation methods. - **Attention adjustment strategy**: Through the TF - IDF - modified attention mechanism, automatically highlight the key words in clinical notes, provide quantitative contributions and explain the prediction results. Finally, experiments on the MIMIC - III dataset show that this model outperforms the existing state - of - the - art models in both diagnosis prediction and heart failure prediction tasks.