Efficient Top-k Similar Short Texts Extraction Algorithm

Yanhui GU,Bin ZHAO,Junsheng ZHOU,Weiguang QU
DOI: https://doi.org/10.3778/j.issn.1673-9418.1403053
2014-01-01
Abstract:Extracting similar short texts efficiently is an essential research issue for many applications. However, most of the existing strategies focus on the effectiveness aspect. The existing state-of-the-art strategies cannot satisfy the users’performance requirement while efficiency issue is important especially for current big data applications. This paper addresses the efficiency issue of extracting similar short texts, i.e., how to efficiently get the top-k semantic similar short texts to a query for a give sentence collection. This paper also proposes an efficient strategy to tackle the performance problems based on a basic framework. Extensive experimental evaluations demonstrate that the pro-posed strategy improves the extraction efficiency while keeping the effectiveness, and is better than the existing strategies in efficiency.
What problem does this paper attempt to address?