Word Sense Disambiguation Corpus Acquisition by Language Model Validation

GUO Yu-hang,CHE Wan-xiang,LIU Ting
DOI: https://doi.org/10.3969/j.issn.1003-0077.2008.06.007
2008-01-01
Abstract:The lack of hand-crafted training data is a critical issue for supervised word sense disambiguation(WSD) systems.The monosemous lexical relatives substitution of target words have been proposed to acquire WSD corpus from the Web automatically.However,in some cases,the monosemous lexical relatives cannot be substituted by the target word suitably and then noises will be brought in.We propose a language models validation method to filter these noises,which can purify the training data,and improve the performance accordingly.Our experiments on Senseval-3 Chinese lexical sample task show that the system based on the training data acquired from the Web with language model validation achieves better accuracy than the one without language models validation.
What problem does this paper attempt to address?