A text guided multi-task learning network for multimodal sentiment analysis

Yuanyi Luo,Rui Wu,Jiafeng Liu,Xianglong Tang
DOI: https://doi.org/10.1016/j.neucom.2023.126836
IF: 6
2023-09-01
Neurocomputing
Abstract:Multimodal Sentiment Analysis (MSA) is an active area of research that leverages multimodal signals for affective understanding of user-generated videos. Existing research tends to develop sophisticated fusion techniques to fuse unimodal representations into multimodal representation and treat MSA as a single prediction task. However, we find that the text modality with the pre-trained model (BERT) learn more semantic information and dominates the training in multimodal models, inhibiting the learning of other modalities. Besides, the classification ability of each modality is also suppressed by single-task learning. In this paper, We propose a text guided multi-task learning network to enhance the semantic information of non-text modalities and improve the learning ability of unimodal networks. We conducted experiments on multimodal sentiment analysis datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results show that our method outperforms the current SOTA method.
computer science, artificial intelligence
What problem does this paper attempt to address?