Covering ambiguity resolution in Chinese word segmentation based on contextual information

Xiao Luo,Maosong Sun,Benjamin K. Tsou
DOI: https://doi.org/10.3115/1072228.1072283
2002-01-01
Abstract:Covering ambiguity is one of the two basic types of ambiguities in Chinese word segmentation. We regard its resolution as equivalent to word sense disambiguation, and make use of the classical vector space model in information retrieval to formulate the contexts of ambiguous words. A variation form of TFIDF weighting is proposed and a Chinese thesaurus is additionally utilized to cope with data sparseness problem. We select 90 frequent cases of covering ambiguities as the target. The training set includes 77654 sentences, and the test set includes 19242 sentences. The experimental results showed that our model has achieved 96.58% accuracy, outperforming the original form of TFIDF weighting as well as another baseline model, the hidden Markov model.
What problem does this paper attempt to address?