ICS-GNN+: Lightweight Interactive Community Search Via Graph Neural Network
Jiazun Chen,Jun Gao,Bin Cui
DOI: https://doi.org/10.1007/s00778-022-00754-0
2022-01-01
The VLDB Journal
Abstract:Searching for a community containing a query node in an online social network enjoys wide applications like recommendation, team organization, etc. When applied to real-life networks, the existing approaches face two major limitations. First, they usually take two steps, i.e., crawling a large part of the network first and then finding the community next, but the entire network is usually too big and most of the data are not interesting to end users. Second, the existing methods utilize hand-crafted rules to measure community membership, while it is very difficult to define effective rules as the communities are flexible for different query nodes. This paper proposes an interactive community search method based on graph neural network (shortened by ICS-GNN(+)) to locate the target community over a subgraph collected on the fly from an online network iteratively. In each iteration, we first build a candidate subgraph around the query node and labeled nodes. We then train a node classification model using GNN to determine whether every node belongs to the target community, which captures similarities between nodes by combining content and structural features seamlessly and flexibly under the guide of users' labeling. Based on the probabilities inferred from the trained GNN, we introduce a k-sized Maximum-GNN-scores (shortened by kMG) community to describe the target community and design a method to locate the kMG community which will be evaluated by end users to acquire more feedback. Besides, various optimization strategies are proposed including an adaptive method to maintain the subgraph during iterations, combining ranking loss into the GNN model, generating node embedding enhanced by pseudo-labels from node clusters in the subgraph, and a greedy community searching method with benefit computed globally. We conduct the experiments on both offline and online real-life datasets, and demonstrate that ICS-GNN(+) can produce effective communities with low overhead in communication, computation, and user labeling.