Abstract:<p>Social media has become indispensable to people's lives, where they can share their views and emotion with images and texts. Analyzing social images for sentiment prediction can help understand human social behavior and provide better recommendation results. Most current researches on image sentiment analysis have achieved quite good progress, which ignores the semantic correlation between an image and its corresponding descriptive sentences (caption). To capture the complementary multimodal information for joint sentiment classification, in this paper, we propose a novel cross-modal Semantic Content Correlation(SCC) method based on deep matching and hierarchical networks, which bridges the correlation between images and captions. Specifically, pre-trained convolutional neural networks (CNNs) are leveraged to encode the visual sub-regions contents, and a GloVe is employed to embed the textual semantic. Relying on visual contents and textual semantic, a joint attention network is proposed to learn the content correlation of the image and its caption, which is then exported as an image-text pair. To exploit the dependence of visual contents on textual semantic in caption effectively, the caption is processed by a Class-Aware Sentence Representation (CASR) network with a class dictionary, and a fully connected layer concatenates the outputs of CASR into a class-aware vector. Finally, the class-aware distributed vector is fed into an Inner-class Dependency Long Short-Term Memory network (IDLSTM) with the image–text pair as a query to further capture the cross-modal non-linear correlations for sentiment prediction. The performance of extensive experiments conducted on three datasets verifies the effectiveness of the model SCC.</p>

Sentiment Analysis of Social Images Via Hierarchical Deep Fusion of Content and Links.

Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations

Sentiment Analysis Using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities.

Image-text sentiment analysis via deep multimodal attentive fusion.

Visual-Textual Sentiment Analysis Enhanced by Hierarchical Cross-Modality Interaction

Image Sentiment Analysis Method Based on Multi-Level Feature Fusion

VISUAL AND TEXTUAL SENTIMENT ANALYSIS USING DEEP FUSION CONVOLUTIONAL NEURAL NETWORKS

Exploring Multimodal Multiscale Features for Sentiment Analysis Using Fuzzy-Deep Neural Network Learning

Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion

Cross-modal image sentiment analysis via deep correlation of textual semantic

A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning

Hierarchical Fusion Network with Enhanced Knowledge and Contrastive Learning for Multimodal Aspect-Based Sentiment Analysis on Social Media

Visual-textual Sentiment Classification with Bi-Directional Multi-Level Attention Networks

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

A Hierarchical Lstm Model with Multiple Features for Sentiment Analysis of Sina Weibo Texts

Transfer Correlation Between Textual Content to Images for Sentiment Analysis

Fusion-Extraction Network for Multimodal Sentiment Analysis

From Content to Links: Social Image Embedding with Deep Multimodal Model.

Scanning, Attention, and Reasoning Multimodal Content for Sentiment Analysis.

A cross-model hierarchical interactive fusion network for end-to-end multimodal aspect-based sentiment analysis

A Multi-level Style Feature Model for Visual Sentiment Analysis