Abstract:Multimodal sentiment classification is a notable research field that aims to refine sentimental information and classify the sentiment tendency from sequential multimodal data. Most existing sentimental recognition algorithms explore multimodal fusion schemes that achieve good performance. However, there are two key challenges to overcome. First, it is essential to effectively extract inter- and intra-modality features prior to fusion, while simultaneously reducing ambiguity. The second challenge is how to learn modality-invariant representations that capture the underlying similarities. In this paper, we present a modality-invariant temporal learning technique and a new gated inter-modality attention mechanism to overcome these issues. For the first challenge, our proposed gated inter-modality attention mechanism performs modality interactions and filters inconsistencies from multiple modalities in an adaptive manner. We also use parallel structures to learn more comprehensive sentimental information in pairs (i.e., acoustic and visual). In addition, to address the second problem, we treat each modality as a multivariate Gaussian distribution (considering each timestamp as a single Gaussian distribution) and use the KL divergence to capture the implicit temporal distribution-level similarities. These strategies are helpful in reducing domain shifts between different modalities and extracting effective sequential modality-invariant representations. We have conducted experiments on several public datasets (i.e., YouTube and MOUD) and the results show that our proposed method outperforms the state-of-the-art multimodal sentiment categorization methods.

Multi-Task Momentum Distillation for Multimodal Sentiment Analysis

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Multimodal Sentiment Analysis With Two-Phase Multi-Task Learning

Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

Dynamic Weighted Multitask Learning and Contrastive Learning for Multimodal Sentiment Analysis

A text guided multi-task learning network for multimodal sentiment analysis

Adaptive Modality Distillation for Separable Multimodal Sentiment Analysis

Low-rank tensor fusion and self-supervised multi-task multimodal sentiment analysis

M$^{3}$SA: Multimodal Sentiment Analysis Based on Multi-Scale Feature Extraction and Multi-Task Learning

A Unified Self-Distillation Framework for Multimodal Sentiment Analysis with Uncertain Missing Modalities

Improving the Modality Representation with Multi-View Contrastive Learning for Multimodal Sentiment Analysis

Multi-Grained Fusion Network with Self-Distillation for Aspect-Based Multimodal Sentiment Analysis

Text-oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences

Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning

Contrastive Knowledge Distillation for Robust Multimodal Sentiment Analysis

Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model

Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis

Multimodal Representations Learning Based on Mutual Information Maximization and Minimization and Identity Embedding for Multimodal Sentiment Analysis

A Multimodal Sentiment Analysis Method Integrating Multi-Layer Attention Interaction and Multi-Feature Enhancement