Abstract:Multi-modal emotion analysis, as an important direction in affective computing, has attracted increasing attention in recent years. Most existing multi-modal emotion recognition studies are targeted at a classification task that aims to assign a specific emotion category to a combination of several heterogeneous input data, including multimedia signals and physiological signals. Compared to single-class emotion recognition, a growing number of recent psychological evidence suggests that different discrete emotions may co-exist at the same time, which promotes the development of mixed-emotion recognition to identify a mixture of basic emotions. Although most current studies treat it as a multi-label classification task, in this work, we focus on a challenging situation where both positive and negative emotions are presented simultaneously, and propose a multi-modal mixed emotion recognition framework, namely EmotionDict. The key characteristics of our EmotionDict include the following. (1) Inspired by the psychological evidence that such a mixed state can be represented by combinations of basic emotions, we address mixed emotion recognition as a label distribution learning task. An emotion dictionary has been designed to disentangle the mixed emotion representations into a weighted sum of a set of basic emotion elements in a shared latent space and their corresponding weights. (2) While many existing emotion distribution studies are built on a single type of multimedia signal (such as text, image, audio, and video), we incorporate physiological and overt behavioral multi-modal signals, including electroencephalogram (EEG), peripheral physiological signals, and facial videos, which directly display the subjective emotions. These modalities have diverse characteristics given that they are related to the central or peripheral nervous system, and the motor cortex. (3) We further design auxiliary tasks to learn modality attentions for modality integration. Experiments on two datasets show that our method outperforms existing state-of-the-art approaches on mixed-emotion recognition.

Multimodal Emotion Recognition by Extracting Common and Modality-Specific Information.

Multimodal Emotion Recognition by Combining Physiological Signals and Facial Expressions: a Preliminary Study.

A multimodal emotion recognition model integrating speech, video and MoCAP

Multimodal Emotion Recognition Based on Feature Fusion.

Investigating Multisensory Integration in Emotion Recognition Through Bio-Inspired Computational Models

Multi-modal emotion analysis from facial expressions and electroencephalogram.

Multimodal modelling of human emotion using sound, image and text fusion

Multimodal Emotion Recognition Using Different Fusion Techniques

Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

Modality-collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition

Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Multimodal interaction enhanced representation learning for video emotion recognition

Emotion Dictionary Learning with Modality Attentions for Mixed Emotion Exploration

Multimodal Emotion Recognition based on the Fusion of EEG Signals and Eye Movement Data

Multimodal emotion recognition model via hybrid model with improved feature level fusion on facial and EEG feature set

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Multimodal emotion recognition from facial expression and speech based on feature fusion

Multimodal Emotion Recognition From EEG Signals and Facial Expressions

Multimodal Emotion Recognition Model using Physiological Signals

EffMulti: Efficiently Modeling Complex Multimodal Interactions for Emotion Analysis