MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

Zheng Lian,Licai Sun,Yong Ren,Hao Gu,Haiyang Sun,Lan Chen,Bin Liu,Jianhua Tao
2024-04-21
Abstract:Multimodal emotion recognition plays a crucial role in enhancing user experience in human-computer interaction. Over the past few decades, researchers have proposed a series of algorithms and achieved impressive progress. Although each method shows its superior performance, different methods lack a fair comparison due to inconsistencies in feature extractors, evaluation manners, and experimental settings. These inconsistencies severely hinder the development of this field. Therefore, we build MERBench, a unified evaluation benchmark for multimodal emotion recognition. We aim to reveal the contribution of some important techniques employed in previous works, such as feature selection, multimodal fusion, robustness analysis, fine-tuning, pre-training, etc. We hope this benchmark can provide clear and comprehensive guidance for follow-up researchers. Based on the evaluation results of MERBench, we further point out some promising research directions. Additionally, we introduce a new emotion dataset MER2023, focusing on the Chinese language environment. This dataset can serve as a benchmark dataset for research on multi-label learning, noise robustness, and semi-supervised learning. We encourage the follow-up researchers to evaluate their algorithms under the same experimental setup as MERBench for fair comparisons. Our code is available at:
Human-Computer Interaction
What problem does this paper attempt to address?
The paper attempts to address the issue of inconsistent evaluation standards in the field of multimodal emotion recognition. Specifically: 1. **Establishing a Unified Evaluation Benchmark**: The paper proposes a unified evaluation benchmark named MERBench, aiming to eliminate inconsistencies in feature extractors, evaluation methods, and experimental settings across different approaches, thereby enabling fair comparisons. Through this benchmark, researchers can explore how to select features suitable for different datasets, determine multimodal fusion strategies, improve cross-corpus performance, and enhance noise robustness. 2. **New Dataset MER2023**: To further support research in multi-label learning, noise robustness, and semi-supervised learning, the authors have also constructed a new Chinese emotion dataset named MER2023. This dataset includes three subsets: a multi-label subset for studying the correlation between discrete and dimensional labels, a noise subset for evaluating noise robustness, and an unlabeled subset for studying semi-supervised learning. 3. **Guiding Subsequent Research**: The paper hopes to provide clear and comprehensive guidance for subsequent researchers through a systematic evaluation of existing technologies, and encourages them to conduct algorithm evaluations in the same experimental environment to ensure the comparability of results.