How to Bridge Spatial and Temporal Heterogeneity in Link Prediction? A Contrastive Method

Yu Tai,Xinglong Wu,Hongwei Yang,Hui He,Duanjing Chen,Yuanming Shao,Weizhe Zhang
2024-11-01
Abstract:Temporal Heterogeneous Networks play a crucial role in capturing the dynamics and heterogeneity inherent in various real-world complex systems, rendering them a noteworthy research avenue for link prediction. However, existing methods fail to capture the fine-grained differential distribution patterns and temporal dynamic characteristics, which we refer to as spatial heterogeneity and temporal heterogeneity. To overcome such limitations, we propose a novel \textbf{C}ontrastive Learning-based \textbf{L}ink \textbf{P}rediction model, \textbf{CLP}, which employs a multi-view hierarchical self-supervised architecture to encode spatial and temporal heterogeneity. Specifically, aiming at spatial heterogeneity, we develop a spatial feature modeling layer to capture the fine-grained topological distribution patterns from node- and edge-level representations, respectively. Furthermore, aiming at temporal heterogeneity, we devise a temporal information modeling layer to perceive the evolutionary dependencies of dynamic graph topologies from time-level representations. Finally, we encode the spatial and temporal distribution heterogeneity from a contrastive learning perspective, enabling a comprehensive self-supervised hierarchical relation modeling for the link prediction task. Extensive experiments conducted on four real-world dynamic heterogeneous network datasets verify that our \mymodel consistently outperforms the state-of-the-art models, demonstrating an average improvement of 10.10\%, 13.44\% in terms of AUC and AP, respectively.
Social and Information Networks,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the link prediction problem in Temporal Heterogeneous Networks (THNs). Specifically, existing methods are unable to capture fine - grained distribution patterns and temporal dynamic characteristics, namely spatial heterogeneity and temporal heterogeneity, when dealing with dynamic heterogeneous networks. These problems lead to the poor performance of existing models on the link prediction task. Therefore, the paper proposes a Contrastive Learning - based Link Prediction (CLP) model, aiming to encode spatial and temporal heterogeneity through a multi - perspective hierarchical self - supervised architecture, thereby improving the accuracy of link prediction. ### Main Contributions 1. **Bridging Spatial and Temporal Heterogeneity**: The paper proposes a three - layer hierarchical contrastive entity - relationship extraction module to achieve multi - perspective difference elimination, thereby bridging spatial and temporal heterogeneity in the link prediction scenario. 2. **Heterogeneous Temporal Graph Network Design**: In order to absorb sequence and structure distribution paradigms and comprehensively eliminate differences from different perspectives, the paper designs a heterogeneous temporal graph network. 3. **Experimental Verification**: The paper conducts extensive experiments on four benchmark datasets. The results show that CLP outperforms the existing state - of - the - art link prediction methods in predicting future links between two entities, with an average improvement of 10.10% in AUC and 13.44% in AP. ### Method Overview 1. **Structural Feature Modeling Layer**: Represent the features of different types of nodes and edges in THN through a two - layer hierarchical Graph Attention Network (GAT), and model from two perspectives of node and edge levels. In addition, introduce a contrastive representation method to distinguish the feature heterogeneity at the node and edge levels and enhance the ability to capture structural heterogeneity. 2. **Temporal Information Modeling Layer**: Use LSTM and GRU models to analyze the time - snapshot patterns respectively and capture long - and short - term dependencies. At the same time, implement a contrastive learning strategy to bridge the differences between these two sequence learning paradigms, thereby retaining temporal heterogeneity. 3. **Output Layer**: Represent the target link by calculating the similarity between nodes and incorporate it into the comprehensive loss function to estimate the probability of the existence of the target link. ### Mathematical Symbols - \( G=\{G_1, G_2,\ldots, G_T\} \): A sequence of heterogeneous networks at different time points. - \( G_t = (\mathcal{V}_t,\mathcal{E}_t) \): The heterogeneous snapshot graph at time point \( t \). - \( \mathcal{V}_t \): The set of nodes at time point \( t \). - \( \mathcal{E}_t \): The set of edges at time point \( t \). - \( G_{t}^{\mathcal{R}} \): The sub - network at time point \( t \) with edge type \( \mathcal{R} \). - \( T \): The maximum number of graph snapshots. - \( \alpha_{ij}^{\mathcal{R}t} \): The attention score of node \( i \) and node \( j \) in the sub - graph of type \( \mathcal{R} \) at time point \( t \). - \( w_{ij}^{\mathcal{R}t} \): The attention weight of node \( i \) and node \( j \) in the sub - graph of type \( \mathcal{R} \) at time point \( t \). - \( u_i^{\mathcal{R}t} \): The node - level representation of node \( i \) in the sub - graph of type \( \mathcal{R} \) at time point \( t \). - \( a_i^{\mathcal{R}t} \): The attention score of node \( i \) for the edge of type \( \mathcal{R} \) in the \( t \)-th snapshot. - \( w_i^{\mathcal{R}t} \): The...