Word Similarity Measurement Based on BaiduBaike

ZHAN Zhi-jian,LIANG Li-na,YANG Xiao-ping
DOI: https://doi.org/10.3969/j.issn.1002-137x.2013.06.043
2013-01-01
Computer Science
Abstract:Research on word similarity measurement has been popular not only in natural language processing but also in other basic research.Traditional word similarity measurements use semantic lexical or large-scale corpus.We first discussed the background of the applications of word similarity measurement,such as information retrieval,information extraction,text classification,example-based machine translation,etc.Then two strategies of word similarity measurement were summarized:one is based on ontology or a semantic taxonomy,the other is based on large collocations of words in corpus.BaiduBaike,an online open encyclopedia,could be used not only as a corpus but also a knowledge resource with rich semantic information.Based on BaiduBaike with its rich semantic information and category graph,we proposed a new method to analyze and compute Chinese word similarity from four dimensions:the baike card,the content of word,the open classification of word and the correlation words.We used language-network to choose top key terms of content of word.Based on vector space mode(VSM) theory,we calculated the similarity between parts of words.We presented a new multi-path searching algorithm on BaiduBaike category graph.A comprehensive similarity measuring method based on the four parts was proposed.Experiment results show that the method has a good performance.
What problem does this paper attempt to address?