Self-Training GNN-based Community Search in Large Attributed Heterogeneous Information Networks

Yuan Li,Xiuxu Chen,Yuhai Zhao,Wen Shan,Zhengkui Wang,Guoli Yang,Guoren Wang
DOI: https://doi.org/10.1109/icde60146.2024.00216
2024-01-01
Abstract:Attributed Heterogeneous Information Networks (AHINs) amalgamate the advantages of attributed graphs (AGs) and heterogeneous information networks (HINs) to model intri-cate systems. Within this context, community search-aiming to identify the most probable community containing the queried ver-tex-has been extensively explored in AGs and HINs. However, existing methodologies fall short in simultaneously accommodating heterogeneous attributes and multiple meta-paths in AHINs, posing a substantial challenge in investigating community search within expansive AHINs. Recent studies highlight the efficacy of machine learning-based community search, offering enhanced flexibility and higher-quality communities in comparison to traditional structural-based methods. Yet, semi-supervised learning methods demand substantial labeled data and incur considerable memory and time costs when applied to large AHINs. To tackle these challenges, we propose a MK (Most-likely; K-sized) community search approach. This approach involves defining an MK community and leveraging Graph Neural Networks (GNNs) to amalgamate structures and attributes into a unified goodness metric. Our methodology involves training on local subgraphs sampled via guided random walks based on multiple meta-paths, circumventing the need for training on the entire graph. Moreover, attention-based GNNs adeptly learn meta-path weights to guide weighted walks in subsequent iterations. Additionally, self-training is employed to alleviate the labeling burden. We also demonstrate that pinpointing the location for the MK community is NP-hard and present a heuristic local search strategy that expedites the resolution process through rewriting. Ultimately, the convergence of iterations yields the solution. Extensive experiments conducted on four real-world datasets underscore that the MK framework significantly enhances both effectiveness and efficiency in community search within AHINs. Our code is publicly available at https://github.com/uucxuu/CSAH.
What problem does this paper attempt to address?