Abstract:Multi-modal emotion analysis, as an important direction in affective computing, has attracted increasing attention in recent years. Most existing multi-modal emotion recognition studies are targeted at a classification task that aims to assign a specific emotion category to a combination of several heterogeneous input data, including multimedia signals and physiological signals. Compared to single-class emotion recognition, a growing number of recent psychological evidence suggests that different discrete emotions may co-exist at the same time, which promotes the development of mixed-emotion recognition to identify a mixture of basic emotions. Although most current studies treat it as a multi-label classification task, in this work, we focus on a challenging situation where both positive and negative emotions are presented simultaneously, and propose a multi-modal mixed emotion recognition framework, namely EmotionDict. The key characteristics of our EmotionDict include the following. (1) Inspired by the psychological evidence that such a mixed state can be represented by combinations of basic emotions, we address mixed emotion recognition as a label distribution learning task. An emotion dictionary has been designed to disentangle the mixed emotion representations into a weighted sum of a set of basic emotion elements in a shared latent space and their corresponding weights. (2) While many existing emotion distribution studies are built on a single type of multimedia signal (such as text, image, audio, and video), we incorporate physiological and overt behavioral multi-modal signals, including electroencephalogram (EEG), peripheral physiological signals, and facial videos, which directly display the subjective emotions. These modalities have diverse characteristics given that they are related to the central or peripheral nervous system, and the motor cortex. (3) We further design auxiliary tasks to learn modality attentions for modality integration. Experiments on two datasets show that our method outperforms existing state-of-the-art approaches on mixed-emotion recognition.

Lightly-supervised Utterance-Level Emotion Identification Using Latent Topic Modeling of Multimodal Words.

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

Multimodal Latent Emotion Recognition from Micro-expression and Physiological Signals

Learning Fine-Grained Cross Modality Excitement for Speech Emotion Recognition

Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition.

Self-adaptive Context and Modal-interaction Modeling For Multimodal Emotion Recognition

Multimodal Emotion Recognition by Extracting Common and Modality-Specific Information.

Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances

Multimodal Emotion Recognition Based on Feature Selection and Extreme Learning Machine in Video Clips.

Utterance Independent Bimodal Emotion Recognition in Spontaneous Communication

Multimodal interaction enhanced representation learning for video emotion recognition

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Simplifying Multimodal Emotion Recognition with Single Eye Movement Modality

Emotion Dictionary Learning with Modality Attentions for Mixed Emotion Exploration

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

A robust multimodal approach for emotion recognition

Multimodal Sentiment Sensing and Emotion Recognition Based on Cognitive Computing Using Hidden Markov Model with Extreme Learning Machine

Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation

Emotion Recognition in Speech with Latent Discriminative Representations Learning