Mining High-quality Samples from Raw Data and Majority Voting Method for Multimodal Emotion Recognition

Qifei Li,Yingming Gao,Ya Li
DOI: https://doi.org/10.1145/3581783.3612862
2023-01-01
Abstract:Automatic emotion recognition has a wide range of applications in human-computer interaction. In this paper, we present our work in the Multimodel Emotion Recognition (MER) 2023, which contains three sub-challenges: MER-MULTI, MER-NOISE, and MER-SEMI. We first use a vanilla semi-supervised method to mine high quality samples from the MER-SEMI unlabeled dataset to expand the training set. Specifically, we ensemble three models trained with the official training set by a majority voting method, which is used to select samples with high prediction consistency. The selected samples together with the original training set are further augmented by adding noise. Then, the features of different modalities of expanded dataset are extracted from several pre-trained or fine-tuned models, and they are subsequently used to create different feature combinations to capture more effective emotion representations. Besides, we employ early fusion of different modal features and late fusion of different recognition models to obtain the final prediction. Experimental results show that our proposed method improves the performance over the official baselines by 30.4%, 55.3% and 1.57% for the three sub-challenges and ranks 4, 3, and 5, respectively. The present work sheds light on high-quality data mining and model ensemble by majority voting for multimodal emotion recognition.
What problem does this paper attempt to address?