Abstract:Multimodal sentiment classification is a notable research field that aims to refine sentimental information and classify the sentiment tendency from sequential multimodal data. Most existing sentimental recognition algorithms explore multimodal fusion schemes that achieve good performance. However, there are two key challenges to overcome. First, it is essential to effectively extract inter- and intra-modality features prior to fusion, while simultaneously reducing ambiguity. The second challenge is how to learn modality-invariant representations that capture the underlying similarities. In this paper, we present a modality-invariant temporal learning technique and a new gated inter-modality attention mechanism to overcome these issues. For the first challenge, our proposed gated inter-modality attention mechanism performs modality interactions and filters inconsistencies from multiple modalities in an adaptive manner. We also use parallel structures to learn more comprehensive sentimental information in pairs (i.e., acoustic and visual). In addition, to address the second problem, we treat each modality as a multivariate Gaussian distribution (considering each timestamp as a single Gaussian distribution) and use the KL divergence to capture the implicit temporal distribution-level similarities. These strategies are helpful in reducing domain shifts between different modalities and extracting effective sequential modality-invariant representations. We have conducted experiments on several public datasets (i.e., YouTube and MOUD) and the results show that our proposed method outperforms the state-of-the-art multimodal sentiment categorization methods.

Multimodal Sentiment Analysis Based on Attentional Temporal Convolutional Network and Multi-Layer Feature Fusion

Sentiment Analysis Using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities.

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Multi-Feature Fusion for Multimodal Attentive Sentiment Analysis.

A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis

Multi-Feature Fusion Multi-Modal Sentiment Analysis Model Based on Cross-Attention Mechanism

Multimodal sentiment analysis based on multiple attention

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

Multi-Channel Attentive Graph Convolutional Network with Sentiment Fusion for Multimodal Sentiment Analysis

Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis

Sentiment Analysis of Social Media Comments Based on Multimodal Attention Fusion Network

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism

TeFNA: Text-centered Fusion Network with crossmodal Attention for multimodal sentiment analysis

A Multimodal Sentiment Analysis Method Integrating Multi-Layer Attention Interaction and Multi-Feature Enhancement

Context-Dependent Multimodal Sentiment Analysis Based on a Complex Attention Mechanism

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism

Tri-Modalities Fusion for Multimodal Sentiment Analysis

A multimodal sentiment recognition method based on attention mechanism

Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis