Abstract:Multimodal sentiment analysis focuses on the fusion of multiple modalities. However, modality representation learning is a key step for better modality fusion, so how to fully learn the sentiment information of non-text modalities is a problem worth exploring. In addition, how to further improve the accuracy of sentiment polarity prediction is also a work to be studied. To solve the above problems, we propose a multimodal sentiment analysis model with effective context semantic modality fusion and sentiment polarity correction (CSMF-SPC). Firstly, we design a low-rank multimodal fusion network based on context semantic modality (CSM-LRMFN). CSM-LRMFN uses the bi-directional long short-term memory network to extract the context semantic features of non-text modalities, and the BERT to extract the features of text modality. Then, CSM-LRMFN adopts a low-rank multimodal fusion method to fully extract the interaction information among modalities with contextual semantics. Different from previous studies, to improve the accuracy of sentiment polarity prediction, we design a weight self-adjusting sentiment polarity penalty loss function, which makes the model learn more sentiment features that are conducive to model prediction through backpropagation. Finally, a series of comparative experiments are conducted on the CMU-MOSI and CMU-MOSEI datasets. Compared with the current representative models, CSMF-SPC achieves better experimental results. Among them, the Acc-2 (including zero) metric is increased by 1.41% and 1.58% on the word-aligned and unaligned CMU-MOSI datasets respectively; it is improved by 1.50% and 2.14% respectively on the CMU-MOSEI dataset, which indicates that the improvement of CSMF-SPC is effective.

TGMoE: A Text Guided Mixture-of-Experts Model for Multimodal Sentiment Analysis

A Multimodal Sentiment Analysis Method Integrating Multi-Layer Attention Interaction and Multi-Feature Enhancement

MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction Experts

Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism

Multimodal Sentiment Analysis with Word-Level Fusion and Reinforcement Learning

Multi-layer cross-modality attention fusion network for multimodal sentiment analysis

Text-oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences

Multimodal sentiment analysis based on multi-head attention mechanism

Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

CSMF-SPC: Multimodal Sentiment Analysis Model with Effective Context Semantic Modality Fusion and Sentiment Polarity Correction

Multimodal Sentiment Analysis of Graphic Texts Based on Multicategorical Relative Fusion

Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis

A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning

Multi-Feature Fusion Multi-Modal Sentiment Analysis Model Based on Cross-Attention Mechanism

Balanced sentimental information via multimodal interaction model

A text guided multi-task learning network for multimodal sentiment analysis

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

M$^{3}$SA: Multimodal Sentiment Analysis Based on Multi-Scale Feature Extraction and Multi-Task Learning

Multimodal sentiment analysis based on multiple attention

M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis