Abstract:Goal: As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. Method: In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions, speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch, speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. Result and Conclusions: Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.

Multimodal Adaptive Emotion Transformer with Flexible Modality Inputs on A Novel Dataset with Continuous Labels

JDAT: Joint-Dimension-Aware Transformer with Strong Flexibility for EEG Emotion Recognition

SEED-VII: A Multimodal Dataset of Six Basic Emotions with Continuous Labels for Emotion Recognition

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Research on Multimodal Emotion Recognition Based on Fusion of Electroencephalogram and Electrooculography

Transformer-Based Multimodal Emotional Perception for Dynamic Facial Expression Recognition in the Wild

Temporal aware Mixed Attention-based Convolution and Transformer Network for cross-subject EEG emotion recognition

An End-to-End Transformer with Progressive Tri-Modal Attention for Multi-modal Emotion Recognition.

Multimodal Neurophysiological Transformer for Emotion Recognition

Emotion recognition based on multi-modal electrophysiology multi-head attention Contrastive Learning

Emotion Recognition Using Cross-Modal Attention from Eeg and Facial Expression

Functional Emotion Transformer for EEG-Assisted Cross-Modal Emotion Recognition.

Multimodal Emotion Recognition From EEG Signals and Facial Expressions

MPED: A Multi-Modal Physiological Emotion Database for Discrete Emotion Recognition

MindLink-Eumpy: An Open-Source Python Toolbox for Multimodal Emotion Recognition

HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition in the Wild

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Multimodal Emotion Recognition Based on EEG and EOG Signals Evoked by the Video-Odor Stimuli

A Multimodal Dataset for Mixed Emotion Recognition

Multimodal Multi-View Spectral-Spatial-Temporal Masked Autoencoder for Self-Supervised Emotion Recognition

Joint low-rank tensor fusion and cross-modal attention for multimodal physiological signals based emotion recognition