Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark

Zheng Lian,Haiyang Sun,Licai Sun,Lan Chen,Haoyu Chen,Hao Gu,Zhuofan Wen,Shun Chen,Siyuan Zhang,Hailiang Yao,Mingyu Xu,Kang Chen,Bin Liu,Rui Liu,Shan Liang,Ya Li,Jiangyan Yi,Jianhua Tao

2024-10-02

Abstract:Multimodal Emotion Recognition (MER) is an important research topic. This paper advocates for a transformative paradigm in MER. The rationale behind our work is that current approaches often rely on a limited set of basic emotion labels, which do not adequately represent the rich spectrum of human emotions. These traditional and overly simplistic emotion categories fail to capture the inherent complexity and subtlety of human emotional experiences, leading to limited generalizability and practicality. Therefore, we propose a new MER paradigm called Open-vocabulary MER (OV-MER), which encompasses a broader range of emotion labels to reflect the richness of human emotions. This paradigm relaxes the label space, allowing for the prediction of arbitrary numbers and categories of emotions. To support this transition, we provide a comprehensive solution that includes a newly constructed database based on LLM and human collaborative annotations, along with corresponding metrics and a series of benchmarks. We hope this work advances emotion recognition from basic emotions to more nuanced emotions, contributing to the development of emotional AI.

Human-Computer Interaction

What problem does this paper attempt to address?

The problem this paper attempts to address is the limitations present in current Multimodal Emotion Recognition (MER) methods. Specifically, traditional MER methods typically rely on a limited set of basic emotion labels, which cannot adequately represent the richness and complexity of human emotions. This results in limited generalization ability and practicality of these methods in real-world applications. Therefore, the authors propose a new MER paradigm—Open-vocabulary Multimodal Emotion Recognition (OV-MER), which aims to reflect the richness of human emotions by covering a broader range of emotion labels. This new paradigm relaxes the constraints of the label space, allowing for the prediction of an arbitrary number and variety of emotions. To support this shift, the authors have constructed a new database based on large language models (LLM) and human collaborative annotation, and have proposed corresponding evaluation metrics and a series of benchmarks. The authors hope that this work will promote the development of emotion recognition from basic emotions to more nuanced emotions, thereby advancing the progress of emotional artificial intelligence.

Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition

A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face

Explainable Multimodal Emotion Reasoning: a Promising Way to Open-set Emotion Recognition

MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning.

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Multimodal Emotion Recognition based on Facial Expressions, Speech, and EEG

Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model

A Versatile Multimodal Learning Framework For Zero-shot Emotion Recognition

Smile upon the Face but Sadness in the Eyes: Emotion Recognition based on Facial Expressions and Eye Behaviors

Multimodal Emotion Recognition by Extracting Common and Modality-Specific Information.

VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints

AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition

Deep Imbalanced Learning for Multimodal Emotion Recognition in Conversations

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

MEmoR: A Dataset for Multimodal Emotion Reasoning in Videos

Multimodal Adaptive Emotion Transformer with Flexible Modality Inputs on A Novel Dataset with Continuous Labels

Multiplex graph aggregation and feature refinement for unsupervised incomplete multimodal emotion recognition

A Multimodal Dataset for Mixed Emotion Recognition

MEMO-Bench: A Multiple Benchmark for Text-to-Image and Multimodal Large Language Models on Human Emotion Analysis