Abstract:Multimodal sentiment classification is a notable research field that aims to refine sentimental information and classify the sentiment tendency from sequential multimodal data. Most existing sentimental recognition algorithms explore multimodal fusion schemes that achieve good performance. However, there are two key challenges to overcome. First, it is essential to effectively extract inter- and intra-modality features prior to fusion, while simultaneously reducing ambiguity. The second challenge is how to learn modality-invariant representations that capture the underlying similarities. In this paper, we present a modality-invariant temporal learning technique and a new gated inter-modality attention mechanism to overcome these issues. For the first challenge, our proposed gated inter-modality attention mechanism performs modality interactions and filters inconsistencies from multiple modalities in an adaptive manner. We also use parallel structures to learn more comprehensive sentimental information in pairs (i.e., acoustic and visual). In addition, to address the second problem, we treat each modality as a multivariate Gaussian distribution (considering each timestamp as a single Gaussian distribution) and use the KL divergence to capture the implicit temporal distribution-level similarities. These strategies are helpful in reducing domain shifts between different modalities and extracting effective sequential modality-invariant representations. We have conducted experiments on several public datasets (i.e., YouTube and MOUD) and the results show that our proposed method outperforms the state-of-the-art multimodal sentiment categorization methods.

Multi-Modal Music Mood Classification Using Co-Training

Multimodal Music Mood Classification by Fusion of Audio and Lyrics.

Graph-Based Multimodal Music Mood Classification in Discriminative Latent Space.

Automatic Music Emotion Classification Using a New Classification Algorithm

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Sentiment Analysis Using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities.

Automatic Music Mood Classification by Learning Cross-Media Relevance Between Audio and Lyrics

Boosting for Multi-Modal Music Emotion Classification.

Enhancing Music Mood Recognition with LLMs and Audio Signal Processing: A Multimodal Approach

Improve the application of reinforcement learning and multi‐modal information in music sentiment analysis

Semi-Supervised Classification of Musical Genre Using Multi-View Features.

Multimodel Music Emotion Recognition Using Unsupervised Deep Neural Networks

Improving Music Genre Classification from Multi-Modal Properties of Music and Genre Correlations Perspective

Multimodal Music Emotion Recognition with Hierarchical Cross-Modal Attention Network

Audio Tonality Mode Classification Without Tonic Annotations.

Exploring modality-agnostic representations for music classification

Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better

Multimodal Sentiment Recognition With Multi-Task Learning

Real-Time Human-Music Emotional Interaction Based on Deep Learning and Multimodal Sentiment Analysis

Real-time Human-Music Emotional Interaction Based on Multimodal Analysis