Abstract:Bag-of-visual Words (BoW) image representation has been illustrated as one of the most promising solutions for large-scale near-duplicated image retrieval. However, the traditional visual vocabulary is created in an unsupervised way by clustering a large number of image local features. This is not ideal because it largely ignores the semantic and spatial contexts between local features. In this paper, we propose the geometric visual vocabulary which captures the spatial contexts by quantizing local features in bi-space, i.e., in descriptor space and orientation space. Then, we propose to capture the semantic context by learning a semantic-aware distance metric between local features, which could reasonably measure the semantic similarities between image patches, from which the local features are extracted. The learned distance is hence utilized to cluster the local features for semantic visual vocabulary generation. Finally, we combine the spatial and semantic contexts in a unified framework by extracting local feature groups, computing the spatial configurations between the local features inside the group, and learning a semantic-aware distance between groups. The learned group distance is then utilized to cluster the extracted local feature groups to generate a novel visual vocabulary, i.e., the contextual visual vocabulary. The proposed visual vocabularies, i.e., geometric visual vocabulary, semantic visual vocabulary and contextual visual vocabulary are tested in large-scale near-duplicated image retrieval applications. The geometric visual vocabulary and semantic visual vocabulary achieve better performance than the traditional visual vocabulary. Moreover, the contextual visual vocabulary, which combines both spatial and semantic clues outperforms the state-of-the-art bundled feature in both retrieval precision and efficiency.

Spatial-Content Image Search In Complex Scenes

Composition Based Semantic Scene Retrieval For Ancient Murals

Scene image retrieval via re-ranking semantic and packed dense interestpoints

Search by mobile image based on visual and spatial consistency

A Retrieval Mechanism for Complex Similarity Queries in Image Databases

Facilitating Image Search with a Scalable and Compact Semantic Mapping

Object-Based Image Retrieval Using Spatial Context

Learning similarity for image retrieval with locally spatial information feedback

Complex spatial region representation and similar matching for multi-object image retrieval

Spatial Coding for Large Scale Partial-Duplicate Web Image Search

Scene Image Retrieval with Siamese Spatial Attention Pooling

SVS-JOIN: Efficient Spatial Visual Similarity Join over Multimedia Data

Visual Vocabulary Optimization with Spatial Context for Image Annotation and Classification

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

An Image Retrieval Combining Color and Spatial Information

Exploring Entity-Level Spatial Relationships for Image-Text Matching

Visual Content Recognition by Exploiting Semantic Feature Map with Attention and Multi-task Learning

Spatial similarity retrieval of symbolic images with repeated symbols

Spatial Verification for Scalable Mobile Image Retrieval

A survey of content-based image retrieval with high-level semantics

Contextual Hashing for Large-Scale Image Search.