Abstract:<p>Social media has become indispensable to people's lives, where they can share their views and emotion with images and texts. Analyzing social images for sentiment prediction can help understand human social behavior and provide better recommendation results. Most current researches on image sentiment analysis have achieved quite good progress, which ignores the semantic correlation between an image and its corresponding descriptive sentences (caption). To capture the complementary multimodal information for joint sentiment classification, in this paper, we propose a novel cross-modal Semantic Content Correlation(SCC) method based on deep matching and hierarchical networks, which bridges the correlation between images and captions. Specifically, pre-trained convolutional neural networks (CNNs) are leveraged to encode the visual sub-regions contents, and a GloVe is employed to embed the textual semantic. Relying on visual contents and textual semantic, a joint attention network is proposed to learn the content correlation of the image and its caption, which is then exported as an image-text pair. To exploit the dependence of visual contents on textual semantic in caption effectively, the caption is processed by a Class-Aware Sentence Representation (CASR) network with a class dictionary, and a fully connected layer concatenates the outputs of CASR into a class-aware vector. Finally, the class-aware distributed vector is fed into an Inner-class Dependency Long Short-Term Memory network (IDLSTM) with the image–text pair as a query to further capture the cross-modal non-linear correlations for sentiment prediction. The performance of extensive experiments conducted on three datasets verifies the effectiveness of the model SCC.</p>

Image Annotation by Incorporating Word Correlations into Multi-Class SVM

Automatic Image Annotation Based on Wordnet and Hierarchical Ensembles

Correlative multi-label multi-instance image annotation

Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching

Automatic Image Annotation Based-On Model Space

Automatic image annotation via local multi-label classification

Towards Multi-Semantic Image Annotation with Graph Regularized Exclusive Group Lasso

Image annotation using the summation of negative probability based on SVM

Improve Image Annotation by Combining Multiple Models

Context-Based Support Vector Machines for Interconnected Image Annotation

AN SVDD-BASED AUTOMATIC IMAGE ANNOTATION METHOD

Image Annotations Based on Semi-supervised Clustering with Semantic Soft Constraints.

The image annotation algorithm using convolutional features from intermediate layer of deep learning

Automatic image annotation based on salient regions

Content-Based Image Orientation Detection with Support Vector Machines

Correlative Multi-Label Video Annotation.

Cross-modal image sentiment analysis via deep correlation of textual semantic

Image Annotation In A Progressive Way

Automatic web image annotation via web-scale image semantic space learning

Multi-Modal Multi-Label Semantic Indexing of Images Using Unlabeled Data

Multi-Graph Similarity Reinforcement For Image Annotation Refinement