Abstract:Bag-of-visual Words (BoW) image representation has been illustrated as one of the most promising solutions for large-scale near-duplicated image retrieval. However, the traditional visual vocabulary is created in an unsupervised way by clustering a large number of image local features. This is not ideal because it largely ignores the semantic and spatial contexts between local features. In this paper, we propose the geometric visual vocabulary which captures the spatial contexts by quantizing local features in bi-space, i.e., in descriptor space and orientation space. Then, we propose to capture the semantic context by learning a semantic-aware distance metric between local features, which could reasonably measure the semantic similarities between image patches, from which the local features are extracted. The learned distance is hence utilized to cluster the local features for semantic visual vocabulary generation. Finally, we combine the spatial and semantic contexts in a unified framework by extracting local feature groups, computing the spatial configurations between the local features inside the group, and learning a semantic-aware distance between groups. The learned group distance is then utilized to cluster the extracted local feature groups to generate a novel visual vocabulary, i.e., the contextual visual vocabulary. The proposed visual vocabularies, i.e., geometric visual vocabulary, semantic visual vocabulary and contextual visual vocabulary are tested in large-scale near-duplicated image retrieval applications. The geometric visual vocabulary and semantic visual vocabulary achieve better performance than the traditional visual vocabulary. Moreover, the contextual visual vocabulary, which combines both spatial and semantic clues outperforms the state-of-the-art bundled feature in both retrieval precision and efficiency.

Semantic Sparse Recoding of Visual Content for Image Applications

Image annotation by semantic sparse recoding of visual content.

Learning Descriptive Visual Representation by Semantic Regularized Matrix Factorization.

Learning Descriptive Visual Representation for Image Classification and Annotation

Semantics-Preserving Bag-of-Words Models and Applications

Semantic Reconstruction based on RGB Image and Sparse Depth

Non-Negative Sparse Semantic Coding for Text Categorization

Semantic-Aware Visual Decomposition for Image Coding

Direct Semantic Analysis for Social Image Classification

Image classification by visual bag-of-words refinement and reduction

Weakly supervised sparse coding with geometric consistency pooling

Sparse Coding Based Multi-Option Semantic Composition

Visual word coding based on difference maximization.

Semantic Retrieval of Remote Sensing Images Based on the Bag-of-Words Association Mapping Method

Towards Semantically Scalable Image Coding Using Semantic Map.

Toward Semantic Communications: Deep Learning-Based Image Semantic Coding

Sparse Concept Coding for Visual Analysis

Semantic Arithmetic Coding using Synonymous Mappings

Semantic classifier based on compressed sensing for image and video annotation

Building Descriptive and Discriminative Visual Codebook for Large-Scale Image Applications.

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval