Integrating Extra Knowledge into Word Embedding Models for Biomedical NLP Tasks

Yuan Ling,Yuan An,Mengwen Liu,Sadid A. Hasan,Yetian Fan,Xiaohua Hu
DOI: https://doi.org/10.1109/ijcnn.2017.7965957
2017-01-01
Abstract:Word embedding in the NLP area has attracted increasing attention in recent years. The continuous bag-of-words model (CBOW) and the continuous Skip-gram model (Skip-gram) have been developed to learn distributed representations of words from a large amount of unlabeled text data. In this paper, we explore the idea of integrating extra knowledge to the CBOW and Skip-gram models and applying the new models to biomedical NLP tasks. The main idea is to construct a weighted graph from knowledge bases (KBs) to represent structured relationships among words/concepts. In particular, we propose a GCBOW model and a GSkip-gram model respectively by integrating such a graph into the original CBOW model and Skip-gram model via graph regularization. Our experiments on four general domain standard datasets show encouraging improvements with the new models. Further evaluations on two biomedical NLP tasks (biomedical similarity/relatedness task and biomedical Information Retrieval (IR) task) show that our methods have better performance than baselines.
What problem does this paper attempt to address?