Abstract:Multimodal sentiment classification is a notable research field that aims to refine sentimental information and classify the sentiment tendency from sequential multimodal data. Most existing sentimental recognition algorithms explore multimodal fusion schemes that achieve good performance. However, there are two key challenges to overcome. First, it is essential to effectively extract inter- and intra-modality features prior to fusion, while simultaneously reducing ambiguity. The second challenge is how to learn modality-invariant representations that capture the underlying similarities. In this paper, we present a modality-invariant temporal learning technique and a new gated inter-modality attention mechanism to overcome these issues. For the first challenge, our proposed gated inter-modality attention mechanism performs modality interactions and filters inconsistencies from multiple modalities in an adaptive manner. We also use parallel structures to learn more comprehensive sentimental information in pairs (i.e., acoustic and visual). In addition, to address the second problem, we treat each modality as a multivariate Gaussian distribution (considering each timestamp as a single Gaussian distribution) and use the KL divergence to capture the implicit temporal distribution-level similarities. These strategies are helpful in reducing domain shifts between different modalities and extracting effective sequential modality-invariant representations. We have conducted experiments on several public datasets (i.e., YouTube and MOUD) and the results show that our proposed method outperforms the state-of-the-art multimodal sentiment categorization methods.

An Autoencoder-based Self-Supervised Learning for Multimodal Sentiment Analysis

Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Learning Speaker-Independent Multimodal Representation for Sentiment Analysis

Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

Unsupervised Multimodal Language Representations using Convolutional Autoencoders

Self-HCL: Self-Supervised Multitask Learning with Hybrid Contrastive Learning Strategy for Multimodal Sentiment Analysis

Shared and Private Information Learning in Multimodal Sentiment Analysis with Deep Modal Alignment and Self-supervised Multi-Task Learning

Dynamic Weighted Multitask Learning and Contrastive Learning for Multimodal Sentiment Analysis

Multimodal Contrastive Learning via Uni-Modal Coding and Cross-Modal Prediction for Multimodal Sentiment Analysis

AMSA: Adaptive Multimodal Learning for Sentiment Analysis

Multimodal Representations Learning Based on Mutual Information Maximization and Minimization and Identity Embedding for Multimodal Sentiment Analysis

Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion

Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis

Predicting Microblog Sentiments Via Weakly Supervised Multimodal Deep Learning.

Improving Multimodal Sentiment Analysis: Supervised Angular Margin-based Contrastive Learning for Enhanced Fusion Representation

Sentiment-aware Multimodal Pre-Training for Multimodal Sentiment Analysis

Leveraging Vision-Language Pre-Trained Model and Contrastive Learning for Enhanced Multimodal Sentiment Analysis

Multimodal Sentiment Analysis Missing Modality Reconstruction Network Based on Shared-Specific Features

Multi-level Correlation Mining Framework with Self-Supervised Label Generation for Multimodal Sentiment Analysis