Sememe Tree Prediction for English-Chinese Word Pairs.

Baoju Liu,Xuejun Shang,Liqing Liu,Yuanpeng Tan,Lei Hou,Juanzi Li
DOI: https://doi.org/10.1007/978-981-16-1964-9_2
2020-01-01
Abstract:Sememe is the minimum unambiguous semantic unit in human language. The semantics of word senses are encoded and expressed by sememe trees in sememe knowledge base. Sememe knowledge benefits many NLP tasks. But it is time-consuming to construct the sememe knowledge base manually. There is one existing work that slightly involves sememe tree prediction, but there are two limitations. The first is they use the word as the unit instead of the word sense. The second is that their method only deals with words with dictionary definitions, not all words. In this article, we use English and Chinese bilingual information to help disambiguate word sense. We propose the Chinese and English bilingual sememe tree prediction task which can automatically extend the famous knowledge base HowNet. And we propose two methods. For a given word pair with categorial sememe, starting from the root node, the first method uses neural networks to gradually generate edges and nodes in a depth-first order. The second is a recommended method. For a given word pair with categorial sememe, we find some word pairs with the same categorial sememe and semantically similar to it, and construct a propagation function to transfer sememe tree information of these word pairs to the word pair to be predicted. Experiments show that our method has a significant effect of F1 84.0%. Further, we use the Oxford English-Chinese Bilingual Dictionary as data and add about 90,000 word pairs to HowNet, nearly expanding HowNet by half.
What problem does this paper attempt to address?