Minimum Normalized Google Distance for Unsupervised Multilingual Chinese-English Word Sense Disambiguation

Pengyuan Liu,Yongzeng Xue,Shiqi Li,Shui Liu
DOI: https://doi.org/10.1109/icgec.2010.69
2010-01-01
Abstract:This paper introduces normalized Google distance into the study of word sense disambiguation and presents a novel unsupervised method of word sense disambiguation. The normalized Google distance is a theory of similarity between words and phrases, based on information distance and Kolmogorov complexity by using the world-wide-web as database, with its page counts derived from a search engine such as Google. This unsupervised method regards the word sense disambiguation as a process of searching minimum normalized Google distance between n-gram and the translation or synonym of the target word, based on the supposition that one sense per n-gram. Our System is tested on Multilingual Chinese-English Lexical Sample task in Semeval-2007. Experimental result shows that our method outperforms the best competing system. Our Experiment on nouns of this dataset also gives a promising result.
What problem does this paper attempt to address?