When Heterophily Meets Heterogeneous Graphs: Latent Graphs Guided Unsupervised Representation Learning

Zhixiang Shen,Zhao Kang
2024-09-01
Abstract:Unsupervised heterogeneous graph representation learning (UHGRL) has gained increasing attention due to its significance in handling practical graphs without labels. However, heterophily has been largely ignored, despite its ubiquitous presence in real-world heterogeneous graphs. In this paper, we define semantic heterophily and propose an innovative framework called Latent Graphs Guided Unsupervised Representation Learning (LatGRL) to handle this problem. First, we develop a similarity mining method that couples global structures and attributes, enabling the construction of fine-grained homophilic and heterophilic latent graphs to guide the representation learning. Moreover, we propose an adaptive dual-frequency semantic fusion mechanism to address the problem of node-level semantic heterophily. To cope with the massive scale of real-world data, we further design a scalable implementation. Extensive experiments on benchmark datasets validate the effectiveness and efficiency of our proposed framework. The source code and datasets have been made available at <a class="link-external link-https" href="https://github.com/zxlearningdeep/LatGRL" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Social and Information Networks
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem of dealing with semantic heterophily in unsupervised heterogeneous graph representation learning (UHGRL). Specifically: 1. **Defining the problem**: The authors first define the concept of semantic heterophily and point out that in actual heterogeneous graphs, semantic heterophily is widespread, but existing UHGRL methods often overlook this. Semantic heterophily means that nodes of the same type connected by meta - paths may have different properties or labels. 2. **Quantifying semantic heterophily**: To quantify semantic heterophily, the authors propose two evaluation metrics: Meta - path - level Semantic Homophily Ratio (MHR) and Node - level Semantic Homophily Ratio (NHR). Through empirical analysis, the authors find that there are diverse neighborhood patterns in real - world heterogeneous graphs, and different nodes show different NHRs under the same meta - path, which poses a challenge to node representation learning. 3. **Proposing a solution**: To solve the semantic heterophily problem, the authors propose a new framework - Latent Graphs Guided Unsupervised Representation Learning (LatGRL). The main contributions of this framework include: - **Similarity mining**: Combine the global structure and node attributes to construct fine - grained homogeneity and heterogeneity latent graphs to guide representation learning. - **Adaptive dual - frequency semantic fusion mechanism**: Introduce an adaptive dual - frequency semantic fusion mechanism, and simultaneously process complex homogeneity and heterogeneity neighborhood patterns through a two - pass graph filter to enhance the node - level modeling ability. - **Scalable implementation**: Design a scalable implementation method to meet the challenges of large - scale real - data. 4. **Experimental verification**: Through extensive experiments on benchmark datasets, the effectiveness and efficiency of the proposed framework are verified. The experimental results show that LatGRL performs excellently in classification and clustering tasks and can effectively deal with the semantic heterophily problem. In summary, this paper aims to solve the deficiencies of existing UHGRL methods in dealing with semantic heterophily by introducing new evaluation metrics and frameworks, thereby improving the quality of node representation and the performance of downstream tasks.