Short Text Similarity Calculation Based on Jaccard and Semantic Mixture

Shushu Wu,Fang Liu,Kai Zhang
DOI: https://doi.org/10.1007/978-981-16-1354-8_4
2020-01-01
Abstract:For the sake of enhancing the accuracy of short text similarity calculation, a short text similarity calculation method on account of Jaccard and semantic mixture is proposed. Jaccard is a traditional similarity algorithm based on literal matching. It only considers word form, and its semantic calculation has certain limitations. The word vector can represent the semantic similarity by computing the cosine similarity of two terms in the vector space, and the semantic similarity is obtained by adding and averaging the word similarity of two sentences according to a certain method. The two methods are now weighted to compute the final text similarity. Experiments show that the algorithm improves the recall rate and F value of short text calculation to some extent.
What problem does this paper attempt to address?