Abstract:With the release of increasing open-source emotion recognition datasets on social media platforms and the rapid development of computing resources, multimodal emotion recognition tasks (MER) have begun to receive widespread research attention. The MER task extracts and fuses complementary semantic information from different modalities, which can classify the speaker's emotions. However, the existing feature fusion methods have usually mapped the features of different modalities into the same feature space for information fusion, which can not eliminate the heterogeneity between different modalities. Therefore, it is challenging to make the subsequent emotion class boundary learning. To tackle the above problems, we have proposed a novel Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive for Multimodal Emotion Recognition (AR-IIGCN) method. Firstly, we input video, audio, and text features into a multi-layer perceptron (MLP) to map them into separate feature spaces. Secondly, we build a generator and a discriminator for the three modal features through adversarial representation, which can achieve information interaction between modalities and eliminate heterogeneity among modalities. Thirdly, we introduce contrastive graph representation learning to capture intra-modal and inter-modal complementary semantic information and learn intra-class and inter-class boundary information of emotion categories. Specifically, we construct a graph structure for three modal features and perform contrastive representation learning on nodes with different emotions in the same modality and the same emotion in different modalities, which can improve the feature representation ability of nodes. Extensive experimental works show that the ARL-IIGCN method can significantly improve emotion recognition accuracy on IEMOCAP and MELD datasets.

A Unimodal Valence-Arousal Driven Contrastive Learning Framework for Multimodal Multi-Label Emotion Recognition

Fine-grained Disentangled Representation Learning for Multimodal Emotion Recognition

UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition

A Versatile Multimodal Learning Framework For Zero-shot Emotion Recognition

Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition

Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model

Learning Robust Multi-Modal Representation for Multi-Label Emotion Recognition Via Adversarial Masking and Perturbation

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

Image-Text Multimodal Emotion Classification via Multi-View Attentional Network

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition

First-order Multi-label Learning with Cross-modal Interactions for Multimodal Emotion Recognition

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Emotion recognition based on multi-modal electrophysiology multi-head attention Contrastive Learning

Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better

Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism

Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model

Contrastive Learning based Modality-Invariant Feature Acquisition for Robust Multimodal Emotion Recognition with Missing Modalities

Adversarial Representation with Intra-Modal and Inter-Modal Graph Contrastive Learning for Multimodal Emotion Recognition

A Two-Stage Multimodal Emotion Recognition Model Based on Graph Contrastive Learning