<inline-formula><tex-math notation="LaTeX">$\mathsf{PF\text{-}HIN}$</tex-math></inline-formula>:Pre-Training for Heterogeneous Information Networks

Yang Fang,Xiang Zhao,Yifan Chen,Weidong Xiao,Maarten de Rijke
DOI: https://doi.org/10.1109/tkde.2022.3206597
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:In network representation learning we learn how to represent heterogeneous information networks in a low-dimensional space so as to facilitate effective search, classification, and prediction solutions. Previous network representation learning methods typically require sufficient task-specific labeled data to address domain-specific problems. The trained model usually cannot be transferred to out-of-domain datasets. We propose a self-supervised pre-training and fine-tuning framework, <inline-formula><tex-math notation="LaTeX">$\mathsf{PF\text{-}HIN}$</tex-math></inline-formula> , to capture the features of a heterogeneous information network. Unlike traditional network representation learning models that have to train the entire model all over again for every downstream task and dataset, <inline-formula><tex-math notation="LaTeX">$\mathsf{PF\text{-}HIN}$</tex-math></inline-formula> only needs to fine-tune the model and a small number of extra task-specific parameters, thus improving model efficiency and effectiveness. During pre-training, we first transform the neighborhood of a given node into a sequence. <inline-formula><tex-math notation="LaTeX">$\mathsf{PF\text{-}HIN}$</tex-math></inline-formula> is pre-trained based on two self-supervised tasks, masked node modeling and adjacent node prediction. We adopt deep bi-directional transformer encoders to train the model, and leverage factorized embedding parameterization and cross-layer parameter sharing to reduce the parameters. In the fine-tuning stage, we choose four benchmark downstream tasks, i.e., link prediction, similarity search, node classification, and node clustering. <inline-formula><tex-math notation="LaTeX">$\mathsf{PF\text{-}HIN}$</tex-math></inline-formula> outperforms state-of-the-art alternatives on each of these tasks, on four datasets.
What problem does this paper attempt to address?