Refining Word Embeddings Using Intensity Scores for Sentiment Analysis

Liang-Chih Yu,Jin Wang,K. Robert Lai,Xuejie Zhang
DOI: https://doi.org/10.1109/TASLP.2017.2788182
2018-03-01
Abstract:Word embeddings that provide continuous low-dimensional vector representations of words have been extensively used for various natural language processing tasks. However, existing context-based word embeddings such as Word2vec and GloVe typically fail to capture sufficient sentiment information, which may result in words with similar vector representations having an opposite sentiment polarity e.g., good and bad, thus degrading sentiment analysis performance. To tackle this problem, recent studies have suggested learning sentiment embeddings to incorporate the sentiment polarity positive and negative information from labeled corpora. This study adopts another strategy to learn sentiment embeddings. Instead of creating a new word embedding from labeled corpora, we propose a word vector refinement model to refine existing pretrained word vectors using real-valued sentiment intensity scores provided by sentiment lexicons. The idea of the refinement model is to improve each word vector such that it can be closer in the lexicon to both semantically and sentimentally similar words i.e., those with similar intensity scores and further away from sentimentally dissimilar words i.e., those with dissimilar intensity scores. An obvious advantage of the proposed method is that it can be applied to any pretrained word embeddings. In addition, the intensity scores can provide more fine-grained real-valued sentiment information than binary polarity labels to guide the refinement process. Experimental results show that the proposed refinement model can improve both conventional word embeddings and previously proposed sentiment embeddings for binary, ternary, and fine-grained sentiment classification on the SemEval and Stanford Sentiment Treebank datasets.
What problem does this paper attempt to address?