S-Simrank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

Yuanzhe Cai,Pei Li,Hongyan Liu,Jun He,Xiaoyong Du
DOI: https://doi.org/10.1007/978-3-540-88192-6_30
2009-01-01
Abstract:Both Content analysis and link analysis have its advantages in measuring relationships among documents. In this paper, we propose a new method to combine these two methods to compute the similarity of research papers so that we can do clustering of these papers more accurately. In order to improve the efficiency of similarity calculation, we develop a strategy to deal with the relationship graph separately without affecting the accuracy. We also design an approach to assign different weights to different links to the papers, which can enhance the accuracy of similarity calculation. The experimental results conducted on ACM Data Set show that our new algorithm, S-SimRank,outperforms other algorithms.
What problem does this paper attempt to address?