Efficient Algorithms for Group Hitting Probability Queries on Large Graphs
Qintian Guo,Dandan Lin,Sibo Wang,Raymond Chi-Wing Wong,Wenqing Lin
DOI: https://doi.org/10.1109/tkde.2023.3349164
IF: 9.235
2024-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Given a source node $s$ and a target node $t$, the hitting probability tells us how likely an $\alpha$-terminating random walk (which stops with probability $\alpha$ at each step) starting from $s$ can hit $t$ before it stops. This concept originates from the hitting time, a classic concept in random walks. In this paper, we focus on the group hitting probability (GHP) where the target is a set of nodes, measuring the node-to-group structural proximity. For this group version of the hitting probability, we present efficient algorithms for two types of GHP queries: the pairwise query which returns the GHP value of a target set $T$ with respect to (w.r.t.) a source node $s$, and the top-$k$ query which returns the top-$k$ target sets with the largest GHP value w.r.t. a source node $s$. We first develop an efficient algorithm named SAMBA for the pairwise query, which is built on a group local push algorithm tailored for GHP, with rigorous analysis for correctness. Next, we show how to speed up SAMBA by combining the group local push algorithm with the Monte Carlo approach, where GHP brings new challenges as it might need to consider every hop of the random walk. We tackle this issue with a new formulation of the GHP and show how to provide approximation guarantees with a detailed theoretical analysis. With SAMBA as the backbone, we develop an iterative algorithm for top-$k$ queries, which adaptively refines the bounds for the candidate target sets, and terminates as soon as it meets the stopping condition, thus saving unnecessary computational costs. We further present an optimization technique to accelerate the top-$k$ query, improving its practical performance. Extensive experiments show that our solutions are orders of magnitude faster than their competitors.
computer science, information systems, artificial intelligence,engineering, electrical & electronic