Prompt Fusion Interaction Transformer for Aspect-Based Multimodal Sentiment Analysis

Dan Yang,Xiuhong Li,Zhe Li,Chenyu Zhou,Xiaofan Wang,Fan Chen
DOI: https://doi.org/10.1109/icme57554.2024.10687885
2024-01-01
Abstract:Aspect-based multimodal sentiment analysis (ABMSA) is a recent and popular research area that uses multiple modalities like text and images to determine the sentiment orientation of opinion entities. The main challenge in multimodal sentiment analysis is dynamically modeling each modality and effectively fusing information across different modalities. Existing methods have not considered fine-grained texture features in images, and direct fusion introduces irrelevant and ineffective features unrelated to sentiment information. To address these limitations, we propose a new model for multimodal sentiment analysis, the multimodal prompt fusion interaction Transformer (MPFIT). We designed two key components: 1) the image assist module (IAM), which leverages the self-attention mechanism and statistical pooling to obtain weighted mean and standard deviation vectors, enabling the model to focus on image texture information and reduce image noise. 2) Multimodal prompt fusion (MPF) restricts multimodal fusion to interactions between small prompt tokens that capture vital information from different modalities, allowing the model to focus on features that are more relevant to sentiment information. Experimental results show that our model outperforms baseline models on two publicly available datasets, Twitter-2015 and Twitter-2017. We conducted ablation experiments to evaluate the impact of our key components.
What problem does this paper attempt to address?