Word Sense Disambiguation Method with Topic Feature

Yun Zhou,Ting Wang,Zhiyuan Wang,Lupeng Zhang
DOI: https://doi.org/10.1049/cp.2012.2305
2012-01-01
Abstract:Word sense disambiguation (WSD) is usually confined in a sentence, which results in short text. Moreover, the deficiency of sense-labelled corpus incurs serious data sparsity. Short text and data sparsity hinder the performance improvement of WSD. As an unsupervised learning method, topic model tries to cluster and compress semantic information in the text to improve the generalization of words. This paper proposes a WSD method integrating topic feature which enhances the classifier by LDA (Latent Dirichlet Allocation) topic feature inferred from background corpus, and evaluates the method on all-words WSD task of Senseval-3. Only with a part of SemCor as labelled training dataset, the F1 value of the proposed method is 0.680, which is better than that of best system in Senseval-3 0.652 and that of best result in the literature 0.670 as we are informed. Experimental results also show that appropriate number of topics benefits WSD; the consistence between background corpus and evaluation dataset is the key to improve WSD; larger balanced background corpus brings greater performance increase to WSD system.
What problem does this paper attempt to address?