Abstract:Multimodal sentiment analysis focuses on the fusion of multiple modalities. However, modality representation learning is a key step for better modality fusion, so how to fully learn the sentiment information of non-text modalities is a problem worth exploring. In addition, how to further improve the accuracy of sentiment polarity prediction is also a work to be studied. To solve the above problems, we propose a multimodal sentiment analysis model with effective context semantic modality fusion and sentiment polarity correction (CSMF-SPC). Firstly, we design a low-rank multimodal fusion network based on context semantic modality (CSM-LRMFN). CSM-LRMFN uses the bi-directional long short-term memory network to extract the context semantic features of non-text modalities, and the BERT to extract the features of text modality. Then, CSM-LRMFN adopts a low-rank multimodal fusion method to fully extract the interaction information among modalities with contextual semantics. Different from previous studies, to improve the accuracy of sentiment polarity prediction, we design a weight self-adjusting sentiment polarity penalty loss function, which makes the model learn more sentiment features that are conducive to model prediction through backpropagation. Finally, a series of comparative experiments are conducted on the CMU-MOSI and CMU-MOSEI datasets. Compared with the current representative models, CSMF-SPC achieves better experimental results. Among them, the Acc-2 (including zero) metric is increased by 1.41% and 1.58% on the word-aligned and unaligned CMU-MOSI datasets respectively; it is improved by 1.50% and 2.14% respectively on the CMU-MOSEI dataset, which indicates that the improvement of CSMF-SPC is effective.

Weakly Correlated Multimodal Sentiment Analysis: New Dataset and Topic-oriented Model

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Robust-MSA: Understanding the Impact of Modality Noise on Multimodal Sentiment Analysis

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

RethinkingTMSC: An Empirical Study for Target-Oriented Multimodal Sentiment Classification

Image-Text Multimodal Emotion Classification via Multi-View Attentional Network

Weakening the Dominant Role of Text: CMOSI Dataset and Multimodal Semantic Enhancement Network

Multimodal sentiment analysis based on multiple attention

Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism

Multi-layer cross-modality attention fusion network for multimodal sentiment analysis

Semantic-specific multimodal relation learning for sentiment analysis

CSMF-SPC: Multimodal Sentiment Analysis Model with Effective Context Semantic Modality Fusion and Sentiment Polarity Correction

Learning Speaker-Independent Multimodal Representation for Sentiment Analysis

Multimodal Sentiment Analysis Based on Transformer and Low-rank Fusion

Multimodal sentiment analysis based on multi-head attention mechanism

Various syncretic co‐attention network for multimodal sentiment analysis

Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

Predicting Microblog Sentiments Via Weakly Supervised Multimodal Deep Learning.

Text-Centric Multimodal Contrastive Learning for Sentiment Analysis