A Chinese Multi-label Affective Computing Dataset Based on Social Media Network Users

Jingyi Zhou,Senlin Luo,Haofan Chen
2024-11-13
Abstract:Emotion and personality are central elements in understanding human psychological states. Emotions reflect an individual subjective experiences, while personality reveals relatively stable behavioral and cognitive patterns. Existing affective computing datasets often annotate emotion and personality traits separately, lacking fine-grained labeling of micro-emotions and emotion intensity in both single-label and multi-label classifications. Chinese emotion datasets are extremely scarce, and datasets capturing Chinese user personality traits are even more limited. To address these gaps, this study collected data from the major social media platform Weibo, screening 11,338 valid users from over 50,000 individuals with diverse MBTI personality labels and acquiring 566,900 posts along with the user MBTI personality tags. Using the EQN method, we compiled a multi-label Chinese affective computing dataset that integrates the same user's personality traits with six emotions and micro-emotions, each annotated with intensity levels. Validation results across multiple NLP classification models demonstrate the dataset strong utility. This dataset is designed to advance machine recognition of complex human emotions and provide data support for research in psychology, education, marketing, finance, and politics.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language,Computers and Society
What problem does this paper attempt to address?
The problems that this paper attempts to solve are several key deficiencies in the existing affective computing datasets when labeling emotions and personality traits: 1. **Lack of fine - grained emotion labeling**: Existing affective computing datasets usually only label emotions and personality traits separately, without fine - grained labeling of micro - emotions and their intensities. 2. **Lack of multi - label classification**: Most emotion datasets only support single - label classification, ignoring the situation where multiple emotions may co - exist in the text. 3. **Lack of quantification of emotion intensity**: When labeling emotions, existing datasets often only label the presence or absence of emotions, without providing specific numerical values for emotion intensity. 4. **Lack of Chinese datasets**: In particular, high - quality Chinese emotion datasets for Chinese social media users are very scarce, and few datasets contain both personality trait and emotion labels simultaneously. To solve these problems, the paper proposes to construct a multi - label affective computing dataset (CMACD) based on Chinese social media users (Weibo). This dataset has the following characteristics: - **Integration of personality traits and emotions**: Each user's Weibo post is labeled not only with the emotion category but also with their MBTI personality type. - **Fine - grained emotion labeling**: Each emotion is labeled with an intensity value ranging from 0 to 1, where 0 indicates the absence of the emotion and 1 indicates the maximum emotion intensity. - **Multi - label classification**: A post can be labeled with multiple emotions, reflecting the complexity and diversity of human emotions. - **Large - scale dataset**: The dataset contains 11,338 valid users and 566,900 Weibo posts, covering 16 MBTI personality types. Through these improvements, the CMACD dataset aims to improve the machine's ability to understand complex human emotions and provide data support for research in fields such as psychology, education, marketing, finance, and politics.