Abstract:These days, social media users tend to express their feelings through sharing images online. Capturing the emotions embedded in these social images involves great research challenges and practical values. Most existing works concentrate on extracting the visual feature from a global view, while ignoring the fact that visual objects are also rich in emotion. How to leverage the multilevel visual features to improve the sentiment analysis performance is important yet challenging. Besides, existing works view each social image as an independent sample while ignoring the rich correlations among social images, which may be helpful in detecting visual emotion. In this article, we propose a novel model called social relations-guided multiattention networks (SRGMANs) to incorporate both the multilevel (region-level and object-level) visual features of a single image and the correlations among multiple social images to conduct visual sentiment analysis. Specifically, we first construct a heterogeneous network consisting of various types of social relations and introduce a heterogeneous network embedding method to learn the network representation for each image. Then, two visual attention branches (region attention network and object attention network) are devised to extract emotional and discriminative visual features. For each branch, we design a self-attention module to capture the emotional dependencies among visual parts. Besides, a network-guided attention module is also designed in each branch to focus on more network-related emotional visual parts with the guidance of the topology information. Finally, the attended visual features from the two attention models, together with network representation features, are combined within a holistic framework to predict the sentiment of social images. Extensive experiments demonstrate the superiority of our model on three benchmark datasets.

Visual Sentiment Analysis by Leveraging Local Regions and Human Faces.

Visual Sentiment Prediction Based on Automatic Discovery of Affective Regions

Discovering Affective Regions in Deep Convolutional Neural Networks for Visual Sentiment Prediction.

Multi-scale Features Enhanced Sentiment Region Discovery for Visual Sentiment Analysis

Visual sentiment analysis with semantic correlation enhancement

The Role of Visual Attention in Sentiment Prediction

CausVSR: Causality Inspired Visual Sentiment Recognition

A Multi-Attentive Pyramidal Model for Visual Sentiment Analysis

Stimuli-Aware Visual Emotion Analysis

EERCA-ViT: Enhanced Effective Region and Context-Aware Vision Transformers for Image Sentiment Analysis

Human Emotion Recognition With Relational Region-Level Analysis

Emotion Recognition via Environmental Context and Human Body

Visual Sentiment Analysis With Social Relations-Guided Multiattention Networks

IVSA: Facial Expression Recognition Method with Salient Attention

Image Sentiment Classification Via Multi-Level Sentiment Region Correlation Analysis.

Expression Analysis Based on Face Regions in Read-world Conditions

Video Action Recognition with Attentive Semantic Units

Visual Saliency Maps Can Apply to Facial Expression Recognition

Joint Visual-Textual Sentiment Analysis Based on Cross-Modality Attention Mechanism.

Multi-modal Feature Fistillation Emotion Recognition Method for Social Media

Probing Sentiment-Oriented Pre-Training Inspired by Human Sentiment Perception Mechanism