Multimodal Sentiment Analysis with Preferential Fusion and Distance-aware Contrastive Learning.

Feipeng Ma,Yueyi Zhang,Xiaoyan Sun
DOI: https://doi.org/10.1109/icme55011.2023.00237
2023-01-01
Abstract:Recent efforts on multimodal sentiment analysis (MSA) leverage data from multiple modalities, among which the text modality is heavily relied on. However, the text modality often contains false correlations between text tokens and sentiment labels, leading to errors in sentiment analysis. To address this issue, we propose a new framework, PriSA, which incorporates the preferential fusion and distance-aware contrastive learning. Specifically, we first propose a preferential inter-modal fusion method, which utilizes the text modality to guide the calculation of the inter-modal correlations. Then the resulting inter-modal features are further used to calculate mixed-modal correlations through our proposed distance-aware contrastive learning, which leverages the distance information of the sentiment labels. At last, we identify the sentiment information based on both the mixed-modal correlations and the discriminative intra-modal features extracted from the visual and audio modalities via a self-attention module. Experimental results show that our proposed PriSA achieves the state-of-the-art performance on four datasets, including MOSEI, MOSI, SIMS, and UR-FUNNY. The code is available at https://github.com/FeipengMa6/PriSA.
What problem does this paper attempt to address?