Lexical Sememe Prediction Via Word Embeddings and Matrix Factorization.

Ruobing Xie,Xingchi Yuan,Zhiyuan Liu,Maosong Sun
DOI: https://doi.org/10.24963/ijcai.2017/587
2017-01-01
Abstract:Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a real-world sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.
What problem does this paper attempt to address?