Using Link-Based Content Analysis to Measure Document Similarity Effectively

Pei Li,Zhixu Li,Hongyan Liu,Jun He,Xiaoyong Du
DOI: https://doi.org/10.1007/978-3-642-00672-2_40
2009-01-01
Abstract:Along with a massive amount of information being placed online, it is a challenge to exploit the internal and external information of documents when assessing similarity between them. A variety of approaches have been proposed to model the document similarity based on different foundations, but usually they are not applicable for combining internal and external information. In this paper, we introduce a link-based method into content analysis, which is based on random walk on graphs. By defining similarity as the meeting probability of two random surfers, we propose a computational model for content analysis, which can also be integrated with external information of documents. Empirical study shows that our method achieves good accuracy, acceptable performance and fast convergent rate in multi-relational document similarity measuring.
What problem does this paper attempt to address?