Abstract:Emotion and personality are central elements in understanding human psychological states. Emotions reflect an individual subjective experiences, while personality reveals relatively stable behavioral and cognitive patterns. Existing affective computing datasets often annotate emotion and personality traits separately, lacking fine-grained labeling of micro-emotions and emotion intensity in both single-label and multi-label classifications. Chinese emotion datasets are extremely scarce, and datasets capturing Chinese user personality traits are even more limited. To address these gaps, this study collected data from the major social media platform Weibo, screening 11,338 valid users from over 50,000 individuals with diverse MBTI personality labels and acquiring 566,900 posts along with the user MBTI personality tags. Using the EQN method, we compiled a multi-label Chinese affective computing dataset that integrates the same user's personality traits with six emotions and micro-emotions, each annotated with intensity levels. Validation results across multiple NLP classification models demonstrate the dataset strong utility. This dataset is designed to advance machine recognition of complex human emotions and provide data support for research in psychology, education, marketing, finance, and politics.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are several key deficiencies in the existing affective computing datasets when labeling emotions and personality traits: 1. **Lack of fine - grained emotion labeling**: Existing affective computing datasets usually only label emotions and personality traits separately, without fine - grained labeling of micro - emotions and their intensities. 2. **Lack of multi - label classification**: Most emotion datasets only support single - label classification, ignoring the situation where multiple emotions may co - exist in the text. 3. **Lack of quantification of emotion intensity**: When labeling emotions, existing datasets often only label the presence or absence of emotions, without providing specific numerical values for emotion intensity. 4. **Lack of Chinese datasets**: In particular, high - quality Chinese emotion datasets for Chinese social media users are very scarce, and few datasets contain both personality trait and emotion labels simultaneously. To solve these problems, the paper proposes to construct a multi - label affective computing dataset (CMACD) based on Chinese social media users (Weibo). This dataset has the following characteristics: - **Integration of personality traits and emotions**: Each user's Weibo post is labeled not only with the emotion category but also with their MBTI personality type. - **Fine - grained emotion labeling**: Each emotion is labeled with an intensity value ranging from 0 to 1, where 0 indicates the absence of the emotion and 1 indicates the maximum emotion intensity. - **Multi - label classification**: A post can be labeled with multiple emotions, reflecting the complexity and diversity of human emotions. - **Large - scale dataset**: The dataset contains 11,338 valid users and 566,900 Weibo posts, covering 16 MBTI personality types. Through these improvements, the CMACD dataset aims to improve the machine's ability to understand complex human emotions and provide data support for research in fields such as psychology, education, marketing, finance, and politics.

A Chinese Multi-label Affective Computing Dataset Based on Social Media Network Users

Emotion Detection in Online Social Networks: A Multilabel Learning Approach

A Novel Emotion Lexicon for Chinese Emotional Expression Analysis on Weibo: Using Grounded Theory and Semi-Automatic Methods

HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition in the Wild

Multi-label Emotion Classification for Tweets in Weibo: Method and Application

CMMA: Benchmarking Multi-Affection Detection in Chinese Multi-Modal Conversations.

Establishing a Large Scale Dataset for Image Emotion Analysis Using Chinese Emotion Ontology

A Novel Calibrated Label Ranking Based Method for Multiple Emotions Detection in Chinese Microblogs

HEU Emotion: a Large-Scale Database for Multimodal Emotion Recognition in the Wild

A Multimodal Dataset for Mixed Emotion Recognition

An Entropy-Based Method with a New Benchmark Dataset for Chinese Textual Affective Structure Analysis

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

Content-based Emotion Classification in Online Social Networks for Chinese Microblogs

A Large Finer-grained Affective Computing EEG Dataset

Complex emotion categorization and tagging for Chinese

Building a Chinese Natural Emotional Audio-Visual Database

Construction and application of chinese emotional corpus

MES-P: an Emotional Tonal Speech Dataset in Mandarin Chinese with Distal and Proximal Labels

An EEG-Based Multi-Modal Emotion Database with Both Posed and Authentic Facial Actions for Emotion Analysis

MES-P: an Emotional Tonal Speech Dataset in Mandarin with Distal and Proximal Labels

MPED: A Multi-Modal Physiological Emotion Database for Discrete Emotion Recognition