Abstract:Nowadays, people are accustomed to posting images and associated text for expressing their emotions on social networks. Accordingly, multimodal sentiment analysis has drawn increasingly more attention. Most of the existing image-text multimodal sentiment analysis methods simply predict the sentiment polarity. However, the same sentiment polarity may correspond to quite different emotions, such as happiness vs. excitement and disgust vs. sadness. Therefore, sentiment polarity is ambiguous and may not convey the accurate emotions that people want to express. Psychological research has shown that objects and words are emotional stimuli and that semantic concepts can affect the role of stimuli. Inspired by this observation, this paper presents a new MUlti-Level SEmantic Reasoning network (MULSER) for fine-grained image-text multimodal emotion classification, which not only investigates the semantic relationship among objects and words respectively, but also explores the semantic relationship between regional objects and global concepts. For image modality, we first build graphs to extract objects and global representation, and employ a graph attention module to perform bilevel semantic reasoning. Then, a joint visual graph is built to learn the regional-global semantic relations. For text modality, we build a word graph and further apply graph attention to reinforce the interdependencies among words in a sentence. Finally, a cross-modal attention fusion module is proposed to fuse semantic-enhanced visual and textual features, based on which informative multimodal representations are obtained for fine-grained emotion classification. The experimental results on public datasets demonstrate the superiority of the proposed model over the state-of-the-art methods.

MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis

A Deep Multi-Level Attentive network for Multimodal Sentiment Analysis

Multi-layer cross-modality attention fusion network for multimodal sentiment analysis

Various syncretic co‐attention network for multimodal sentiment analysis

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

MSNet: A Deep Architecture Using Multi-Sentiment Semantics for Sentiment-Aware Image Style Transfer

A text guided multi-task learning network for multimodal sentiment analysis

Cross-modal Enhancement Network for Multimodal Sentiment Analysis

Multi‐level Deep Correlative Networks for Multi‐modal Sentiment Analysis

Multi-level Attention Map Network for Multimodal Sentiment Analysis

Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis

Analyzing multimodal public sentiment based on hierarchical semantic attentional network

Deep Learning-Based Natural Language Processing Methods for Sentiment Analysis in Social Networks

Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection

A Multimodal Sentiment Analysis Method Integrating Multi-Layer Attention Interaction and Multi-Feature Enhancement

Multimodal Emotion Classification with Multi-Level Semantic Reasoning Network

Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations

Multi-Channel Attentive Graph Convolutional Network with Sentiment Fusion for Multimodal Sentiment Analysis

A Fine-Grained Modal Label-Based Multi-Stage Network for Multimodal Sentiment Analysis.

A Multi-sentiment-resource Enhanced Attention Network for Sentiment Classification

MEDT: Using Multimodal Encoding-Decoding Network as in Transformer for Multimodal Sentiment Analysis