Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax

Hao-Ren Yao,Nairen Cao,Katina Russell,Der-Chen Chang,Ophir Frieder,Jeremy Fineman
DOI: https://doi.org/10.1145/3648695
2024-02-21
Abstract:Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.
Machine Learning,Computers and Society
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to perform effective representation learning on electronic health records (EHRs) using self - supervised learning methods without label supervision**. Specifically, the author focuses on how to overcome the limitations of existing methods when dealing with large - scale unlabeled EHR data. In particular, current methods rely on label supervision for specific tasks and their performance degrades when dealing with complex time - series data. ### Problem Background 1. **Importance of EHR Representation Learning** - EHR representation learning is the basis for personalized medicine and clinical decision support. - Most existing methods rely on label supervision for specific tasks, which is not applicable in large - scale unlabeled data scenarios. - The success of self - supervised learning methods (such as contrastive learning) in images and other fields has inspired their application to EHRs. 2. **Limitations of Existing Methods** - Current methods rely on label supervision for specific tasks and cannot be applied to large - scale unlabeled data. - Data augmentation methods (such as randomly cropping historical records, introducing Gaussian noise, etc.) may lead to view blurring, increase the similarity of negative samples, and reduce the similarity of positive samples. - The complex time - series structure and sparse medical coding make it difficult for traditional methods to optimize the reconstruction loss. ### Solutions Proposed in the Paper To solve the above problems, the author proposes **Graph Kernel Infomax (GKI)**, a self - supervised contrastive learning method based on graph kernels. The main features of GKI are as follows: 1. **Data Augmentation without Modifying the Graph Structure** - Nodes are embedded into two geometrically different manifold views through Kernel Subspace Augmentation, instead of changing the graph structure. - Different views are created using different distance metrics (Euclidean distance and spherical distance), thus avoiding the problems caused by directly modifying the graph structure. 2. **Theoretical Analysis and Geometric Explanation** - By adjusting the distance metric in the underlying manifold, the same effect as graph structure augmentation can be achieved. - By maximizing the mutual information (MI) between nodes and graphs in these two manifolds, patient representations can be directly applied to linear models without fine - tuning or label supervision. 3. **Experimental Verification** - The effectiveness of GKI in clinical downstream tasks is verified on a large - scale public EHR dataset. - The generalization and robustness of the method are verified on widely used graph classification benchmark datasets. ### Formula Summary - **Node Representation** \[ h_i^{(l)} = f_\theta(X_i, A_i)\in\mathbb{R}^{n\times d} \] where \(f_\theta\) represents the GNN layer. - **Clustering Loss** \[ L_{\text{rec}}^{(l)}=\|h_i^{(l)} - H_i^{(l)}C_i^{(l)}\|_F \] where \(H_i^{(l)} = f_C^{(l)}(h_i^{(l)})\) is the clustering assignment matrix at the \(l\)-th layer. - **Graph - level Kernel Feature Mapping** \[ \Phi(G_i)=\sum_{l = 1}^L\left[\sum_{j = 1}^n\phi(h_i^{(l)})_j\right]\in\mathbb{R}^{KL} \] - **Contrastive Loss** \[ \ell_{\text{cl}}(p, q)=-\log\frac{\exp(\text{sim}(p, q)/\tau)}{\sum_{q'\in B}1[p\neq q']\exp(\text{sim}(p, q')/\tau)} \] where \(\text{sim}(\cdot)\) represents cosine similarity. Through these innovations, GKI can effectively learn EH without label supervision.