Document similarity measure based on named entity

Jimin Jia,Nenghai Yu,Pingbo Yuan,Chao Chen
2007-01-01
Journal of Computational Information Systems
Abstract:Document similarity measure is quintessential to various text analysis tasks like text retrieval, document clustering, multi-document summarization, etc. The traditional vector space model, though simple and direct, treats the document as a vector of bag-of-words, without giving any consideration of the valuable and informative content existing in the document. In this paper we propose to implant named entity into the document similarity measure. We propose a model to combine the named entity with the primary document. Besides that, a novel evaluation method is also proposed to evaluate the performance of the measure. The experiment shows our method could yield statistically better performance.
What problem does this paper attempt to address?