Efficient Single-Source SimRank Query by Path Aggregation

Mingxi Zhang,Yanghua Xiao,Wei Wang
DOI: https://doi.org/10.1145/3580305.3599328
2023-01-01
Abstract:Single-source SimRank query calculates the similarity between a query node and every node in a graph, which traverses the paths starting from the query node for similarity computation. However, the scale of the paths increases exponentially as path length increases, which decreases the computation efficiency. Sampling-based algorithms reduce computational cost by path sampling, but they need to sample sufficient paths to ensure the accuracy, and the performance might be affected by the large scale of paths. In this paper, we propose VecSim for efficient single-source SimRank query by path aggregation. VecSim first aggregates the paths starting from query node with common arrived nodes step by step to obtain the hitting probabilities, and then aggregates the paths starting from the arrived nodes reversely to obtain the first-meeting probabilities in a similar way, in which only several vectors are maintained. The extra-meeting probabilities are excluded from each step, and an efficient sampling-based algorithm is designed, which estimates the extra-meeting probabilities by sampling paths within a specified length. For further speeding up query processing, we propose a threshold-sieved algorithm, which prunes the entries with small values that contribute little to the final similarity scores by setting a threshold. Extensive experiments are done on four small and four large graphs, which demonstrate that VecSim outperforms the competitors in terms of time and space costs on a comparable accuracy. In particular, VecSim achieves an empirical error of 10 -4 level in under 0.1 second over all of these graphs.
What problem does this paper attempt to address?