Dynamical SimRank Search on Time-Varying Networks
Weiren Yu,Xuemin Lin,Wenjie Zhang,Julie A. McCann
DOI: https://doi.org/10.1007/s00778-017-0488-z
2017-01-01
The VLDB Journal
Abstract:SimRank is an appealing pair-wise similarity measure based on graph structure. It iteratively follows the intuition that two nodes are assessed as similar if they are pointed to by similar nodes. Many real graphs are large, and links are constantly subject to minor changes. In this article, we study the efficient dynamical computation of all-pairs SimRanks on time-varying graphs. Existing methods for the dynamical SimRank computation [e.g., LTSF (Shao et al. in PVLDB 8(8):838–849, 2015) and READS (Zhang et al. in PVLDB 10(5):601–612, 2017)] mainly focus on top-k search with respect to a given query. For all-pairs dynamical SimRank search, Li et al.’s approach (Li et al. in EDBT, 2010) was proposed for this problem. It first factorizes the graph via a singular value decomposition (SVD) and then incrementally maintains such a factorization in response to link updates at the expense of exactness. As a result, all pairs of SimRanks are updated approximately, yielding \(O({r}^{4}n^2)\) time and \(O({r}^{2}n^2)\) memory in a graph with n nodes, where r is the target rank of the low-rank SVD. Our solution to the dynamical computation of SimRank comprises of five ingredients: (1) We first consider edge update that does not accompany new node insertions. We show that the SimRank update \({\varvec{\Delta }}{} \mathbf{S}\) in response to every link update is expressible as a rank-one Sylvester matrix equation. This provides an incremental method requiring \(O(Kn^2)\) time and \(O(n^2)\) memory in the worst case to update \(n^2\) pairs of similarities for K iterations. (2) To speed up the computation further, we propose a lossless pruning strategy that captures the “affected areas” of \({\varvec{\Delta }}{} \mathbf{S}\) to eliminate unnecessary retrieval. This reduces the time of the incremental SimRank to \(O(K(m+|{\textsf {AFF}}|))\), where m is the number of edges in the old graph, and \(|{\textsf {AFF}}| \ (\le n^2)\) is the size of “affected areas” in \({\varvec{\Delta }}{} \mathbf{S}\), and in practice, \(|{\textsf {AFF}}| \ll n^2\). (3) We also consider edge updates that accompany node insertions, and categorize them into three cases, according to which end of the inserted edge is a new node. For each case, we devise an efficient incremental algorithm that can support new node insertions and accurately update the affected SimRanks. (4) We next study batch updates for dynamical SimRank computation, and design an efficient batch incremental method that handles “similar sink edges” simultaneously and eliminates redundant edge updates. (5) To achieve linear memory, we devise a memory-efficient strategy that dynamically updates all pairs of SimRanks column by column in just \(O(Kn+m)\) memory, without the need to store all \((n^2)\) pairs of old SimRank scores. Experimental studies on various datasets demonstrate that our solution substantially outperforms the existing incremental SimRank methods and is faster and more memory-efficient than its competitors on million-scale graphs.