Multi-modal emotion recognition using tensor decomposition fusion and self-supervised multi-tasking
Rui Wang,Jiawei Zhu,Shoujin Wang,Tao Wang,Jingze Huang,Xianxun Zhu
DOI: https://doi.org/10.1007/s13735-024-00347-3
2024-09-04
International Journal of Multimedia Information Retrieval
Abstract:With technological advancements, we can now capture rich dialogue content, tones, textual information, and visual data through tools like microphones, the internet, and cameras. However, relying solely on a single modality for emotion analysis often fails to accurately reflect the true emotional state, as this approach overlooks the dynamic correlations between different modalities. To address this, our study introduces a multimodal emotion recognition method that combines tensor decomposition fusion and self-supervised multi-task learning. This method first employs Tucker decomposition techniques to effectively reduce the model's parameter count, lowering the risk of overfitting. Subsequently, by building a learning mechanism for both multimodal and unimodal tasks and incorporating the concept of label generation, it more accurately captures the emotional differences between modalities. We conducted extensive experiments and analyses on public datasets like CMU-MOSI and CMU-MOSEI, and the results show that our method significantly outperforms existing methods in terms of performance. The related code is open-sourced at https://github.com/ZhuJw31/MMER-TD.
computer science, artificial intelligence, software engineering