TeFNA: Text-centered Fusion Network with crossmodal Attention for multimodal sentiment analysis

Changqin Huang,Junling Zhang,Xuemei Wu,Yi Wang,Ming Li,Xiaodi Huang
DOI: https://doi.org/10.1016/j.knosys.2023.110502
IF: 8.139
2023-03-30
Knowledge-Based Systems
Abstract:Multimodal sentiment analysis (MSA), which goes beyond the analysis of texts to include other modalities such as audio and visual data, has attracted a significant amount of attention. An effective fusion of sentiment information in multiple modalities is key to improving the performance of MSA. However, aligning multiple modalities during the process of fusion faces challenges such as maintaining modal-specific information. This paper proposes a Te xt-centered F usion N etwork with crossmodal A ttention (TeFNA), a multimodal fusion network that uses crossmodal attention to model unaligned multimodal timing information. In particular, TeFNA employs a T ext- C entered A ligned fusion method (TCA) that takes text modality as the primary modality to improve the representation of fusion features. In addition, TeFNA maximizes the mutual information between modality pairs to maintain task-related emotional information, thereby ensuring that the key information of modalities from input to fusion is preserved. The results of our comprehensive experiments on the multimodal datasets of CMU-MOSI and CMU-MOSEI show that our proposed model outperforms methods in terms of most metrics used.
computer science, artificial intelligence
What problem does this paper attempt to address?