Abstract:These days, social media users tend to express their feelings through sharing images online. Capturing the emotions embedded in these social images involves great research challenges and practical values. Most existing works concentrate on extracting the visual feature from a global view, while ignoring the fact that visual objects are also rich in emotion. How to leverage the multilevel visual features to improve the sentiment analysis performance is important yet challenging. Besides, existing works view each social image as an independent sample while ignoring the rich correlations among social images, which may be helpful in detecting visual emotion. In this article, we propose a novel model called social relations-guided multiattention networks (SRGMANs) to incorporate both the multilevel (region-level and object-level) visual features of a single image and the correlations among multiple social images to conduct visual sentiment analysis. Specifically, we first construct a heterogeneous network consisting of various types of social relations and introduce a heterogeneous network embedding method to learn the network representation for each image. Then, two visual attention branches (region attention network and object attention network) are devised to extract emotional and discriminative visual features. For each branch, we design a self-attention module to capture the emotional dependencies among visual parts. Besides, a network-guided attention module is also designed in each branch to focus on more network-related emotional visual parts with the guidance of the topology information. Finally, the attended visual features from the two attention models, together with network representation features, are combined within a holistic framework to predict the sentiment of social images. Extensive experiments demonstrate the superiority of our model on three benchmark datasets.

Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism.

Visual-Textual Sentiment Analysis Enhanced by Hierarchical Cross-Modality Interaction

Visual-textual Sentiment Classification with Bi-Directional Multi-Level Attention Networks

A Multimodal Sentiment Analysis Approach Based on a Joint Chained Interactive Attention Mechanism

Learning Sentiment Sentence Representation with Multiview Attention Model

Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations

Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model

Holistic Visual-Textual Sentiment Analysis with Prior Models

Jointly Learning Attentions With Semantic Cross-Modal Correlation For Visual Question Answering

Video Sentiment Analysis with Bimodal Information-augmented Multi-Head Attention

Multimodal sentiment analysis based on multi-head attention mechanism

Cross-Modality Sentiment Analysis for Social Multimedia

Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis

A multimodal sentiment recognition method based on attention mechanism

Deep Coordinated Textual and Visual Network for Sentiment-Oriented Cross-Modal Retrieval

Visual Sentiment Analysis With Social Relations-Guided Multiattention Networks

Various syncretic co‐attention network for multimodal sentiment analysis

Multi-layer cross-modality attention fusion network for multimodal sentiment analysis

A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis.

Joint Multi-modal Aspect-Sentiment Analysis with Auxiliary Cross-modal Relation Detection.