Abstract:Goal: As an essential human-machine interactive task, emotion recognition has become an emerging area over the decades. Although previous attempts to classify emotions have achieved high performance, several challenges remain open: 1) How to effectively recognize emotions using different modalities remains challenging. 2) Due to the increasing amount of computing power required for deep learning, how to provide real-time detection and improve the robustness of deep neural networks is important. Method: In this paper, we propose a deep learning-based multimodal emotion recognition (MER) called Deep-Emotion, which can adaptively integrate the most discriminating features from facial expressions, speech, and electroencephalogram (EEG) to improve the performance of the MER. Specifically, the proposed Deep-Emotion framework consists of three branches, i.e., the facial branch, speech branch, and EEG branch. Correspondingly, the facial branch uses the improved GhostNet neural network proposed in this paper for feature extraction, which effectively alleviates the overfitting phenomenon in the training process and improves the classification accuracy compared with the original GhostNet network. For work on the speech branch, this paper proposes a lightweight fully convolutional neural network (LFCNN) for the efficient extraction of speech emotion features. Regarding the study of EEG branches, we proposed a tree-like LSTM (tLSTM) model capable of fusing multi-stage features for EEG emotion feature extraction. Finally, we adopted the strategy of decision-level fusion to integrate the recognition results of the above three modes, resulting in more comprehensive and accurate performance. Result and Conclusions: Extensive experiments on the CK+, EMO-DB, and MAHNOB-HCI datasets have demonstrated the advanced nature of the Deep-Emotion method proposed in this paper, as well as the feasibility and superiority of the MER approach.

Dense Graph Convolutional with Joint Cross-Attention Network for Multimodal Emotion Recognition

MMGCN: Multimodal Fusion Via Deep Graph Convolution Network for Emotion Recognition in Conversation

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Multimodal emotion recognition with capsule graph convolutional based representation fusion

DER-GCN: Dialogue and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialogue Emotion Recognition

DGSNet: Dual Graph Structure Network for Emotion Recognition in Multimodal Conversations

Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism

Multiplex graph aggregation and feature refinement for unsupervised incomplete multimodal emotion recognition

Correlation-Driven Multi-Modality Graph Decomposition for Cross-Subject Emotion Recognition

A joint hierarchical cross‐attention graph convolutional network for multi‐modal facial expression recognition

Dynamic Graph Neural Ordinary Differential Equation Network for Multi-modal Emotion Recognition in Conversation

Masked Graph Learning with Recurrent Alignment for Multimodal Emotion Recognition in Conversation

DER-GCN: Dialog and Event Relation-Aware Graph Convolutional Neural Network for Multimodal Dialog Emotion Recognition

MLGAT: multi-layer graph attention networks for multimodal emotion recognition in conversations

Context- and Knowledge-Aware Graph Convolutional Network for Multimodal Emotion Recognition

MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals

Multimodal Fusion via Hypergraph Autoencoder and Contrastive Learning for Emotion Recognition in Conversation

A Two-Stage Multimodal Emotion Recognition Model Based on Graph Contrastive Learning

A Novel and Powerful Dual-Stream Multi-Level Graph Convolution Network for Emotion Recognition

GraphMFT: A Graph Network based Multimodal Fusion Technique for Emotion Recognition in Conversation