Survey of Word Sense Annotated Corpus Construction

JIN Peng,WU Yun-fang,YU Shi-wen
DOI: https://doi.org/10.3969/j.issn.1003-0077.2008.03.002
2008-01-01
Abstract:The bottleneck of word sense disambiguation(WSD)is lack of large scale,high-quality word sense annotated corpus.In this paper,several word sense annotated corpus are introduced in the aspects of corpus coverage,dictionary,tokens,word types and the inter annotator agreement,involving English,Chinese and Japanese.As for the auto and semi-auto construction methods,this papers focuses on bootstrapping methods and word-aligned parallel corpus based approaches.And finally,some issues in the word sense annotated corpus construction are pointed and possible solutions are given.
What problem does this paper attempt to address?