ExactSim: benchmarking single-source SimRank algorithms with high-precision ground truths

Hanzhi Wang,Zhewei Wei,Yu Liu,Ye Yuan,Xiaoyong Du,Ji-Rong Wen
DOI: https://doi.org/10.1007/s00778-021-00672-7
2021-06-05
The VLDB Journal
Abstract:<span><i>SimRank</i> is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-<i>k</i> SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than <span class="mathjax-tex">\(10^6\)</span> nodes. Consequently, no existing work has evaluated the actual accuracy of various single-source and top-<i>k</i> SimRank algorithms on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-<i>k</i> SimRank results on large graphs. This algorithm produces ground truths with precision up to 7 decimal places with high probability. With the ground truths computed by ExactSim, we present the first experimental study of the accuracy/cost trade-offs of existing approximate SimRank algorithms on large real-world graphs and synthetic graphs. Finally, we use the ground truths to exploit various properties of SimRank distributions on large graphs.</span>
computer science, information systems, hardware & architecture
What problem does this paper attempt to address?