A Semantic Similarity Computing Approach Based on WordNet and Corpus Statistics

张东娜,周春光,刘彦斌,郭东伟
DOI: https://doi.org/10.13413/j.cnki.jdxblxb.2010.05.027
2010-01-01
Abstract:We first proposed a new method calculating semantic similarity parameter information content.The new algorithm is based on the concept semantic information in the knowledge base called WordNet and the probability in the corpus called self-information.Then,considering the existing algorithms are all domainrelated and the calculating processes are complicated,we proposed a universal method based on corpus statistics and WordNet calculating semantic similarity which can be used in information extraction,information retrieval,document clustering and ontology learning.The proposed method makes a substantial improvement experimenting on the benchmark data set-RB concept pairs.
What problem does this paper attempt to address?