Automatic Construction and Optimization of Sentiment Lexicon Based on Word2Vec

Xiao-ping YANG,Zhong-xia ZHANG,Liang WANG,Yong-jun ZHANG,Qi-feng MA,Jia-nan WU,Yue ZHANG
DOI: https://doi.org/10.11896/j.issn.1002-137X.2017.01.008
2017-01-01
Abstract:The construction of sentiment lexicon plays an important role in text mining.In recent years,the lexicon annotating format gradually evolves from binary annotation to multiple annotation,and sentiment lexicons of a single specific domain have caught more and more attentions of researchers.However,manual annotation costs too much labor work and time,and it is also difficult to get accurate quantification of emotional intensity.Besides,the excessive emphasis on one specific field has greatly limited the applicability of domain sentiment lexicons[1].This paper implemented statistical training for large-scale Chinese corpus through neural network language model,and proposed an automatic method of constructing a multidimensional sentiment lexicon based on constraints of Euclidean distance group.In order to distinguish the sentiment polarities of those words which may express either positive or negative meanings in different contexts,we further presented a sentiment disambiguation algorithm to increase the flexibility of our lexicon.Lastly,we presented a global optimization framework that provides a unified way to combine several human-annotated resources for learning our 10-dimensional sentiment lexicon SentiRuc.Experiments show the superior performance of SentiRuc lexicon in category labeling test,intensity labeling test and sentiment classification tasks.It is worth mentioning that in intensity label test,SentiRuc outperforms the second place by 23 %.
What problem does this paper attempt to address?