Abstract:Link-based similarity search aims to find similar nodes for a given query node in a graph, which arises in numerous applications, including web spam detection, social network analysis and web search. Among existing methods, SimRank is a well-known similarity model, which provides an effective and trustful function for similarity search. A large amount of techniques on SimRank similarity search are devoted recently, which compute the similarity scores by traversing the paths between query and candidate nodes. However, the number of paths increases exponentially as path length increases, which makes the computation expensive and cannot support fast similarity search over large graphs. In this paper, we propose an efficient index-free SimRank similarity search approach, namely DisSim, which reduces the computational cost by discounting path length. We observe that SimRank could rapidly converge at a stable state and the results change little after a few of iterations. Based on the fast convergence, the similarity between nodes is defined as the SimRank score at the second iteration. For the computation of DisSim, we divide the similarity into one-step and two-step first-meeting probabilities. The one-step first-meeting probabilities are computed by path traverses from query to candidate nodes, which reduces computational cost by skipping unnecessary nodes. And the two-step first-meeting probabilities are computed by integrating the repeated parts of the paths. For further speeding up query processing, we develop a pruning algorithm, which prunes unpromising path traverses by setting a threshold, and the accuracy loss under threshold is given through mathematical analysis. Extensive experiments on real graphs demonstrate the performance of DisSim through comparing with the state-of-the-art algorithms.

Fast and Flexible Top-k Similarity Search on Large Networks

Panther: Fast Top-K Similarity Search on Large Networks

An efficient similarity search framework for SimRank over large dynamic graphs

A Fast Sketch-Based Approach of Top-k Closeness Centrality Search on Large Networks

Efficient Top-K SimRank-based Similarity Join.

Fast Top-K Simple Shortest Paths Discovery in Graphs

Towards Distributed Node Similarity Search on Graphs

Top-k Community Similarity Search Over Large-Scale Road Networks (Technical Report)

UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph

Efficient index-free SimRank similarity search in large graphs by discounting path lengths

Link Prediction Based on Sampling in Complex Networks

Top-k Graph Similarity Search Algorithm Based on Chi-Square Statistics in Probabilistic Graphs

Ground-state energy of jellium.

HetFS: a Method for Fast Similarity Search with Ad-Hoc Meta-Paths on Heterogeneous Information Networks

A Fast and Efficient Algorithm for Mining Top-k Nodes in Complex Networks

HighSim : Highly Effective Similarity Measurement in Large Heterogeneous Information Networks

Efficient and Accurate SimRank-Based Similarity Joins: Experiments, Analysis, and Improvement

Efficient Algorithm for Computing Link-Based Similarity in Real World Networks

We Know Who You Are: Discovering Similar Groups Across Multiple Social Networks

SiMPSON: Efficient Similarity Search in Metric Spaces over P2P Structured Overlay Networks

A Fast Sketch Method for Mining User Similarities over Fully Dynamic Graph Streams.