Multi-level Multi-task Representation Learning with Adaptive Fusion for Multimodal Sentiment Analysis
Chuanbo Zhu,Min Chen,Haomin Li,Sheng Zhang,Han Liang,Chao Sun,Yifan Liu,Jincai Chen
DOI: https://doi.org/10.1007/s00521-024-10678-1
2024-01-01
Neural Computing and Applications
Abstract:Multimodal sentiment analysis is an active task in multimodal intelligence, which aims to compute the user’s sentiment tendency from multimedia data. Generally, each modality is a specific and necessary perspective to express human sentiment, providing complementary and consensus information unavailable in a single modality. Nevertheless, the heterogeneous multimedia data often contain inconsistent and conflicting sentiment semantics that limits the model performance. In this work, we propose a Multi-level Multi-task Representation Learning with Adaptive Fusion (MuReLAF) network to bridge the semantic gap among different modalities. Specifically, we design a modality adaptive fusion block to adjust modality contributions dynamically. Besides, we build a multi-level multimodal representations framework to obtain modality-specific and modality-shared semantics by the multi-task learning strategy, where modality-specific semantics contain complementary information and modality-shared semantics include consensus information. Extensive experiments are conducted on four publicly available datasets: MOSI, MOSEI, SIMS, and SIMSV2(s), demonstrating that our model exhibits superior or comparable performance to state-of-the-art models. The achieved accuracies are 86.28