Multimodal Transformer with Adaptive Modality Weighting for Multimodal Sentiment Analysis

Yifeng Wang,Jiahao He,Di Wang,Quan Wang,Bo Wan,Xuemei Luo
DOI: https://doi.org/10.1016/j.neucom.2023.127181
IF: 6
2024-01-01
Neurocomputing
Abstract:Multimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific tasks or applications, not all modalities hold equal importance. Previous research, however, has either disregarded the importance of modalities altogether or solely focused on the importance of linguistic and non-linguistic modalities while neglecting the importance between non-linguistic modalities. To facilitate effective multimodal information fusion based on the relative importance of modalities, a novel multimodal fusion mode named Multimodal Transformer with Adaptive Modality Weighting (MTAMW) is proposed in this paper. Specifically, we introduce a multimodal adaptive weight matrix that allocates appropriate weights to each modality based on its contribution to sentiment analysis. Furthermore, a multimodal attention mechanism is introduced, utilizing multiple Softmax functions to compute attention weights, thereby efficiently fusion multimodal information via a single-stream Transformer. By meticulously considering the relative importance of each modality during the fusion process, more effective multimodal information fusion is achievable. Extensive experiments on benchmark datasets show that it is superior to or comparable to state-of-the-art methods on MSA tasks. The codes for our experiments are available at https://github.com/Vamos66/MTAMW.
What problem does this paper attempt to address?