Prompt Link Multimodal Fusion in Multimodal Sentiment Analysis

Kang Zhu,Cunhang Fan,Jianhua Tao,Zhao Lv
DOI: https://doi.org/10.21437/interspeech.2024-1512
2024-01-01
Abstract:Multimodal sentiment analysis aims to analyze sentiment by integrating information from various modalities. Combining different modalities can be challenging due to their inherent differences in distance. While researchers employ complex methods to reduce distances, connecting multiple modalities remains limited. In this paper, we introduce the technique of prompt learning and propose the Prompt Link Multimodal Fusion (PLMF), which consists of three components: Channel Prompt Link (CPL), Spatial Prompt Link (SPL), and Fusion Result Constraints (FRC). CPL facilitates fine-grained sentiment feature linkage in the channel dimension, while SPL connects overall sentiment semantic information in the temporal dimension. Due to the randomness of connecting vectors, FRC is proposed to constrain the linkage toward the direction of optimal fusion results. Through the collaborative efforts of these three modules, PLMF achieves state-of-the-art results on three publicly available datasets.
What problem does this paper attempt to address?