Predicting Categorial Sememe for English-Chinese Word Pairs via Representations in Explainable Sememe Space.

Baoju Liu,Lei Hou,Xin Lv,Juanzi Li,Jinghui Xiao
DOI: https://doi.org/10.1007/978-3-030-88480-2_3
2021-01-01
Abstract:Sememe is the minimum unambiguous semantic unit in human language. Sememe knowledge bases(SKB) have been proven to be effective in many NLP tasks. Categorial sememe, indicating the basic category of word sense to bridge the lexicon and semantics, is indispensable in SKB. However, manual categorial sememe annotation is costly. This paper proposes a new task to automatically build SKB: English-Chinese Word Pair Categorial Sememe Prediction. The bilingual information is utilized to resolve the ambiguity challenge. Our method proposes the sememe space, in which sememes, words, and word senses are represented as vectors with interpretable semantics, to bridge the semantic gap between sememes and words. Extensive experiments and analyses validate the effectiveness of the proposed method. Using this method, we predict categorial sememes for 113,014 new word senses, and the prediction MAP is 85.8%. Further we conduct expert annotations based on prediction results and increase HowNet nearly by 50%. We will publish all the data and code.
What problem does this paper attempt to address?