Abstract:To bridge the semantic gap between low level feature and human perception, most of the existing algorithms aim mainly at annotating images with concepts coming from only one semantic space, e.g. cognitive or affective. The naive combination of the outputs from these spaces will implicitly force the conditional independence and ignore the correlations among the spaces. In this paper, to exploit the comprehensive semantic of images, we propose a general framework for harmoniously integrating the above multiple semantics, and investigating the problem of learning to annotate images with training images labeled in two or more correlated semantic spaces, such as fascinating nighttime, or exciting cat. This kind of semantic annotation is more oriented to real world search scenario. Our proposed approach outperforms the baseline algorithms by making the following contributions. 1) Unlike previous methods that annotate images within only one semantic space, our proposed multi-semantic annotation associates each image with labels from multiple semantic spaces. 2) We develop a multi-task linear discriminative model to learn a linear mapping from features to labels. The tasks are correlated by imposing the exclusive group lasso regularization for competitive feature selection, and the graph Laplacian regularization to deal with insufficient training sample issue. 3) A Nesterov-type smoothing approximation algorithm is presented for efficient optimization of our model. Extensive experiments on NUS-WIDEEmotive dataset (56k images) with 8×81 emotive cognitive concepts and Object&Scene datasets from NUS-WIDE well validate the effectiveness of the proposed approach.

Learning Semantic Embedding At A Large Scale

Semantic Image Retrieval Based on Multiple-Instance Learning

Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models

Dual Collaborative Visual-Semantic Mapping for Multi-Label Zero-Shot Image Recognition

Learning Robust Visual-Semantic Embeddings

Towards Semantic Embedding In Visual Vocabulary

Semantic Communication Approach for Multi-Task Image Transmission

Learning Structured Semantic Embeddings for Visual Recognition

Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases.

Image Tagging Via Cross-Modal Semantic Mapping

Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings

A Semantic-Based Method for Visualizing Large Image Collections.

Learning semantic sentence representations from visually grounded language without lexical knowledge

Semantic context learning with large-scale weakly-labeled image set.

Towards Multi-Semantic Image Annotation with Graph Regularized Exclusive Group Lasso

Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment

Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Learning Binary Semantic Embedding Forlarge-Scale Breast Histology Image Analysis.

Multi-view visual semantic embedding for cross-modal image–text retrieval

Learning Deep Structure-Preserving Image-Text Embeddings