Efficient SimRank-based Similarity Join over Large Graphs.

Weiguo Zheng,Lei Zou,Yansong Feng,Lei Chen,Dongyan Zhao
DOI: https://doi.org/10.1145/3083899
IF: 1.6289
2017-01-01
ACM Transactions on Database Systems
Abstract:Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, and so on. In this article, we adopt “SimRank” [13] to evaluate the similarity between two vertices in a large graph because of its generality. Note that “Simank” is purely structure dependent, and it does not rely on the domain knowledge. Specifically, we define a S im R ank-based j oin ( SRJ ) query to find all vertex pairs satisfying the threshold from two sets of vertices U and V . To reduce the search space, we propose a shortest-path-distance-based upper bound for SimRank scores to prune unpromising vertex pairs. In the verification, we propose a novel index, called h-go cover + , to efficiently compute the SimRank score of any single vertex pair. Given a graph G , we only materialize the SimRank scores of a small proportion of vertex pairs (i.e., the h-go cover + vertex pairs), based on which the SimRank score of any vertex pair can be computed easily. To find the h-go cover + vertex pairs, we propose an efficient method without building the vertex-pair graph. Hence, large graphs can be dealt with easily. Extensive experiments over both real and synthetic datasets confirm the efficiency of our solution.
What problem does this paper attempt to address?