A Combined Measure for Text Semantic Similarity

Hao-Di Li,Qing-Cai Chen,Xiao-Long Wang
DOI: https://doi.org/10.1109/icmlc.2013.6890900
2013-01-01
Abstract:With the rapid development of artificial intelligence and natural language processing, text similarity calculation has become the core module of many applications such as semantic disambiguation, information retrieval, automatic question answering and data mining etc. Most of the existing semantic similarity algorithms are based on statistical methods or rule based methods that are conducted on ontology dictionaries and some kind of knowledge bases. Wherein the rule-based methods usually use the dictionary, the ontology tree or graph, or the co-occurrence number of attributes, while the statistical methods may choose to use or not use a knowledge base. While a statistical method of using a knowledge base incorporates more comprehensive knowledge and has the capability of reduces knowledge noise, it usually obtains better performance. Nevertheless, due to the imbalanced distribution of different items in a knowledge base, the semantic similarity calculation results for low-frequency words are usually poor.
What problem does this paper attempt to address?