Chinese EmoBank: Building Valence-Arousal Resources for Dimensional Sentiment Analysis

Lung-Hao Lee,Jian-Hong Li,Liang-Chih Yu
DOI: https://doi.org/10.1145/3489141
2022-07-31
Abstract:An increasing amount of research has recently focused on dimensional sentiment analysis that represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space. Compared to the categorical approach that represents affective states as distinct classes (e.g., positive and negative), the dimensional approach can provide more fine-grained (real-valued) sentiment analysis. However, dimensional sentiment resources with valence-arousal ratings are very rare, especially for the Chinese language. Therefore, this study aims to: (1) Build a Chinese valence-arousal resource called Chinese EmoBank, the first Chinese dimensional sentiment resource featuring various levels of text granularity including 5,512 single words, 2,998 multi-word phrases, 2,582 single sentences, and 2,969 multi-sentence texts. The valence-arousal ratings are annotated by crowdsourcing based on the Self-Assessment Manikin (SAM) rating scale. A corpus cleanup procedure is then performed to improve annotation quality by removing outlier ratings and improper texts. (2) Evaluate the proposed resource using different categories of classifiers such as lexicon-based, regression-based, and neural-network-based methods, and comparing their performance to a similar evaluation of an English dimensional sentiment resource.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of Chinese sentiment analysis resources, especially in the aspect of dimensional sentiment analysis. Specifically, most of the existing Chinese sentiment resources use the categorical method to represent sentiment states, that is, they divide sentiment into several discrete categories (such as positive, neutral, and negative). However, this method cannot provide a sufficiently detailed sentiment analysis. In contrast, dimensional sentiment analysis can provide a more fine - grained sentiment analysis by representing sentiment states with continuous values on multiple dimensions (such as the valence - arousal space). To make up for this deficiency, this research aims to construct a Chinese valence - arousal resource named Chinese EmoBank, which is the first Chinese dimensional sentiment resource containing different text granularity levels, including 5,512 words, 2,998 multi - word phrases, 2,582 single sentences and 2,969 multi - sentence texts. The valence - arousal scores of these texts are labeled by crowdsourcing and are based on the Self - Assessment Manikin (SAM) scale. In addition, a corpus cleaning procedure has been carried out to improve the labeling quality, removing abnormal scores and inappropriate texts. Finally, the researchers evaluated the constructed resource using different classifiers (such as dictionary - based, regression - and neural - network - based methods) and compared its performance with similar English dimensional sentiment resources. In this way, this research not only fills the gap in Chinese dimensional sentiment analysis resources but also verifies the effectiveness of the constructed resource.