Embedding for Words and Word Senses Based on Human Annotated Knowledge Base: A Case Study on HowNet

SUN Maosong,CHEN Xinxiong
2016-01-01
Abstract:This paper aims to address the necessity and effectiveness of encoding a human annotated knowledge base into a neural network language model, using HowNet as a case study. Traditional word embedding is derived from neural network language model trained on a large-scale unlabeled text corpus, however, it suffers from two weaknesses: the first, the quality of resulting vectors of low frequent words is not satisfactory in general, and the second, sense vectors of polysemous words are not available in essence. We propose neural network language models that can systematically learn embedding for all the semantic primitives defined in HowNet, and consequently, obtain word vectors, in particular for low frequent words, and word sense vectors in terms of these semantic primitive vectors. Preliminary experimental results show that our models can improve the performance in tasks of both word similarity and word sense disambiguation. We believe the research for neural network language models incorporating human annotated knowledge bases would be a critical issue deserving our attention in the coming years.
What problem does this paper attempt to address?