MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

Zheng Lian,Licai Sun,Yong Ren,Hao Gu,Haiyang Sun,Lan Chen,Bin Liu,Jianhua Tao

2024-04-21

Abstract:Multimodal emotion recognition plays a crucial role in enhancing user experience in human-computer interaction. Over the past few decades, researchers have proposed a series of algorithms and achieved impressive progress. Although each method shows its superior performance, different methods lack a fair comparison due to inconsistencies in feature extractors, evaluation manners, and experimental settings. These inconsistencies severely hinder the development of this field. Therefore, we build MERBench, a unified evaluation benchmark for multimodal emotion recognition. We aim to reveal the contribution of some important techniques employed in previous works, such as feature selection, multimodal fusion, robustness analysis, fine-tuning, pre-training, etc. We hope this benchmark can provide clear and comprehensive guidance for follow-up researchers. Based on the evaluation results of MERBench, we further point out some promising research directions. Additionally, we introduce a new emotion dataset MER2023, focusing on the Chinese language environment. This dataset can serve as a benchmark dataset for research on multi-label learning, noise robustness, and semi-supervised learning. We encourage the follow-up researchers to evaluate their algorithms under the same experimental setup as MERBench for fair comparisons. Our code is available at:

Human-Computer Interaction

What problem does this paper attempt to address?

The paper attempts to address the issue of inconsistent evaluation standards in the field of multimodal emotion recognition. Specifically: 1. **Establishing a Unified Evaluation Benchmark**: The paper proposes a unified evaluation benchmark named MERBench, aiming to eliminate inconsistencies in feature extractors, evaluation methods, and experimental settings across different approaches, thereby enabling fair comparisons. Through this benchmark, researchers can explore how to select features suitable for different datasets, determine multimodal fusion strategies, improve cross-corpus performance, and enhance noise robustness. 2. **New Dataset MER2023**: To further support research in multi-label learning, noise robustness, and semi-supervised learning, the authors have also constructed a new Chinese emotion dataset named MER2023. This dataset includes three subsets: a multi-label subset for studying the correlation between discrete and dimensional labels, a noise subset for evaluating noise robustness, and an unlabeled subset for studying semi-supervised learning. 3. **Guiding Subsequent Research**: The paper hopes to provide clear and comprehensive guidance for subsequent researchers through a systematic evaluation of existing technologies, and encourages them to conduct algorithm evaluations in the same experimental environment to ensure the comparability of results.

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning.

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Explainable Multimodal Emotion Reasoning: a Promising Way to Open-set Emotion Recognition

MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Improving Multimodal Emotion Recognition by Leveraging Acoustic Adaptation and Visual Alignment

MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis

Bridging the Emotional Semantic Gap via Multimodal Relevance Estimation

Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors

Multiplex graph aggregation and feature refinement for unsupervised incomplete multimodal emotion recognition

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

A Multimodal Dataset for Mixed Emotion Recognition

Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model

EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

Multimodal emotion recognition based on audio and text by using hybrid attention networks

Generating and encouraging: An effective framework for solving class imbalance in multimodal emotion recognition conversation