Extract Salient Words with WordRank for Effective Similarity Search in Text Data

Xiaojun Wan,Jianwu Yang
DOI: https://doi.org/10.1007/11581062_54
2005-01-01
Abstract:We propose a method named WordRank to extract a few salient words from the target document and then use these words to retrieve similar documents based on popular retrieval functions. The set of extracted words is a concise and topic-oriented representation of the target document and reduces the ambiguous and noisy information in the document, so as to improve the retrieval performance. Experiments and results demonstrate the high effectiveness of the proposed approach.
What problem does this paper attempt to address?