Meta-Path-Free Representation Learning on Heterogeneous Networks

Jie Zhang,Jinru Ding,Suyuan Liu,Hongyan Wu
DOI: https://doi.org/10.48550/arXiv.2102.08120
2021-02-16
Abstract:Real-world networks and knowledge graphs are usually heterogeneous networks. Representation learning on heterogeneous networks is not only a popular but a pragmatic research field. The main challenge comes from the heterogeneity -- the diverse types of nodes and edges. Besides, for a given node in a HIN, the significance of a neighborhood node depends not only on the structural distance but semantics. How to effectively capture both structural and semantic relations is another challenge. The current state-of-the-art methods are based on the algorithm of meta-path and therefore have a serious disadvantage -- the performance depends on the arbitrary choosing of meta-path(s). However, the selection of meta-path(s) is experience-based and time-consuming. In this work, we propose a novel meta-path-free representation learning on heterogeneous networks, namely Heterogeneous graph Convolutional Networks (HCN). The proposed method fuses the heterogeneity and develops a $k$-strata algorithm ($k$ is an integer) to capture the $k$-hop structural and semantic information in heterogeneous networks. To the best of our knowledge, this is the first attempt to break out of the confinement of meta-paths for representation learning on heterogeneous networks. We carry out extensive experiments on three real-world heterogeneous networks. The experimental results demonstrate that the proposed method significantly outperforms the current state-of-the-art methods in a variety of analytic tasks.
Social and Information Networks,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively capture the structural and semantic relationships between nodes in representation learning on heterogeneous networks while avoiding the limitations of relying on meta - path selection in existing methods. Specifically, existing meta - path - based methods (such as MetaPath2vec and HAN) rely heavily on the selection of meta - paths in terms of performance, and this selection is often empirical and time - consuming. Therefore, the paper proposes a new meta - path - free heterogeneous network representation learning method - Heterogeneous Convolutional Networks (HCN), aiming to overcome these challenges. ### Main problems 1. **Heterogeneity**: Heterogeneous networks contain multiple types of nodes and edges, which makes representation learning more complex. 2. **Structural and semantic relationships**: In a heterogeneous network, the importance of a node depends not only on its structural distance but also on its semantic relationships. How to effectively capture these two relationships is a challenge. 3. **Meta - path selection**: Existing methods rely on the selection of meta - paths, and this selection is usually empirical and time - consuming, and different meta - paths will lead to different results. ### Solutions The paper proposes the following solutions: - **k - layer algorithm**: By introducing the k - layer algorithm, various compound relationships in the heterogeneous network are naturally fused, thereby capturing structural and semantic information without relying on meta - paths. - **Feature fusion**: Use the trainable transformation matrices of specific - type nodes to fuse different - type node features, enabling different feature spaces to be unified into a common space. - **Online sparsification**: In order to reduce the training cost, the paper proposes an online sparsification technique, that is, randomly discarding some edges in the k - layer adjacency matrix during the training process to reduce the density of the matrix. ### Experimental verification The paper has carried out extensive experiments on three real - world heterogeneous network data sets (DBLP, IMDB, AMiner), and the results show that the proposed method significantly outperforms the existing state - of - the - art methods in multiple analysis tasks. ### Main contributions 1. **First attempt**: This is the first attempt to break away from the meta - path limitation for heterogeneous network representation learning. 2. **Capturing structural and semantic relationships**: The proposed method can effectively capture the structural and semantic relationships between nodes. 3. **Experimental verification**: The effectiveness of the method has been verified through extensive experiments, proving that it performs excellently on multiple data sets and tasks. In conclusion, this paper solves the problem of relying on meta - path selection in existing methods by proposing a meta - path - free heterogeneous network representation learning method, and achieves a significant performance improvement on multiple actual data sets.