Fast All-Pairs SimRank Assessment on Large Graphs and Bipartite Domains

Weiren Yu,Xuemin Lin,Wenjie Zhang,Julie A. McCann
DOI: https://doi.org/10.1109/TKDE.2014.2339828
2015-01-01
Abstract:SimRank is a powerful model for assessing vertex-pair similarities in a graph. It follows the concept that two vertices are similar if they are referenced by similar vertices. The prior work [18] exploits partial sums memoization to compute SimRank in $O(Kmn)$ time on a graph of $n$ vertices and $m$ edges, for $K$ iterations. However, computations among different partial sums may have redundancy. Besides, to guarantee a given accuracy $epsilon$ , the existing SimRank needs $K=lceil log _C ,epsilon rceil$ iterations, where $C$ is a damping factor, but the geometric rate of convergence is slow if a high accuracy is expected. In this paper, (1) a novel clustering strategy is proposed to eliminate duplicate computations occurring in partial sums, - nd an efficient algorithm is then devised to accelerate SimRank computation to $O(K d^{prime } n^2)$ time, where $d^{prime }$ is typically much smaller than $frac{m}{n}$ . (2) A new differential SimRank equation is proposed, which can represent the SimRank matrix as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) In bipartite domains, a novel finer-grained partial max clustering method is developed to speed up the computation of the Minimax SimRank variation from $O(Kmn)$ to $O(Km^{prime }n)$ time, where $m^{prime } ({le} m)$ is the number of edges in a reduced graph after edge clustering, which can be typically much smaller than $m$ . Usi
What problem does this paper attempt to address?