Abstract:Multi‐modal sentiment analysis (MSA) is increasingly becoming a hotspot because it extends the conventional Sentiment analysis (SA) based on texts to multi‐modal content which can provide richer affective information. However, compared with textbased sentiment analysis, multi‐modal sentiment analysis has much more challenges, because the joint learning process on multi‐modal data requires both fine‐grained semantic matching and effective heterogeneous feature fusion. Existing approaches generally infer sentiment type from splicing features extracted from different modalities but neglect the strong semantic correlation among cooccurrence data of different modalities. To solve the challenges, a multi‐level deep correlative network for multimodal sentiment analysis is proposed, which can reduce the semantic gap by analyzing simultaneously the middlelevel semantic features of images and the hierarchical deep correlations. First, the most relevant cross‐modal feature representation is generated with Multi‐modal Deep and discriminative correlation analysis (Multi‐DDCA) while keeping those respective modal feature representations to be discriminative. Second, the high‐level semantic outputs from multi‐modal deep and discriminative correlation analysis are encoded into attention‐correlation cross‐modal feature representation through a co‐attention‐based multimodal correlation submodel, and then they are further merged by multi‐layer neural network to train a sentiment classifier for predicting sentimental categories. Extensive experimental results on five datasets demonstrate the effectiveness of the designed approach, which outperforms several state‐of‐the‐art fusion strategies for sentiment analysis.

A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis

Modality-invariant Temporal Representation Learning for Multimodal Sentiment Classification

Sentiment Analysis Using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities.

MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis

Multi‐level Deep Correlative Networks for Multi‐modal Sentiment Analysis

Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis

Multi-level Attention Map Network for Multimodal Sentiment Analysis

Image-Text Multimodal Emotion Classification via Multi-View Attentional Network

Gated Mechanism for Attention Based Multi Modal Sentiment Analysis

VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

Multimodal sentiment analysis based on multiple attention

Multi-Channel Attentive Graph Convolutional Network with Sentiment Fusion for Multimodal Sentiment Analysis

A Multimodal Sentiment Analysis Method Integrating Multi-Layer Attention Interaction and Multi-Feature Enhancement

Multi-layer cross-modality attention fusion network for multimodal sentiment analysis

Context-Dependent Multimodal Sentiment Analysis Based on a Complex Attention Mechanism

ModalNet: an aspect-level sentiment classification model by exploring multimodal data with fusion discriminant attentional network

Attention-based multi-level image and text sentiment analysis

Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

Various syncretic co‐attention network for multimodal sentiment analysis

Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism