Abstract:Bag-of-visual Words (BoW) image representation has been illustrated as one of the most promising solutions for large-scale near-duplicated image retrieval. However, the traditional visual vocabulary is created in an unsupervised way by clustering a large number of image local features. This is not ideal because it largely ignores the semantic and spatial contexts between local features. In this paper, we propose the geometric visual vocabulary which captures the spatial contexts by quantizing local features in bi-space, i.e., in descriptor space and orientation space. Then, we propose to capture the semantic context by learning a semantic-aware distance metric between local features, which could reasonably measure the semantic similarities between image patches, from which the local features are extracted. The learned distance is hence utilized to cluster the local features for semantic visual vocabulary generation. Finally, we combine the spatial and semantic contexts in a unified framework by extracting local feature groups, computing the spatial configurations between the local features inside the group, and learning a semantic-aware distance between groups. The learned group distance is then utilized to cluster the extracted local feature groups to generate a novel visual vocabulary, i.e., the contextual visual vocabulary. The proposed visual vocabularies, i.e., geometric visual vocabulary, semantic visual vocabulary and contextual visual vocabulary are tested in large-scale near-duplicated image retrieval applications. The geometric visual vocabulary and semantic visual vocabulary achieve better performance than the traditional visual vocabulary. Moreover, the contextual visual vocabulary, which combines both spatial and semantic clues outperforms the state-of-the-art bundled feature in both retrieval precision and efficiency.

Large Scale Near-Duplicate Image Retrieval Via Patch Embedding

Near-duplicate Keyframe Retrieval by Semi-Supervised Learning and Nonrigid Image Matching

Near-duplicate Keyframe Retrieval by Nonrigid Image Matching.

Coupled Binary Embedding for Large-Scale Image Retrieval

Large scale partially duplicated web image retrieval.

Efficient near-duplicate image detection by learning from examples

Encoding Spatial Context for Large-Scale Partial-Duplicate Web Image Retrieval

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Bag-Of-Words Based Deep Neural Network For Image Retrieval

Embedding spatial context information into inverted filefor large-scale image retrieval.

Coherent Phrase Model for Efficient Image Near-Duplicate Retrieval

Large-scale Image Retrieval Based on Boosting Iterative Quantization Hashing with Query-Adaptive Reranking.

Seeing the Big Picture: Deep Embedding with Contextual Evidences

Scalable local feature matching without visual codebook training

Packing and Padding: Coupled Multi-index for Accurate Image Retrieval

A Novel Neural Network for Remote Sensing Image Matching

Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval

ObjectPatchNet: Towards Scalable and Semantic Image Annotation and Retrieval

Benchmarking unsupervised near-duplicate image detection

Collaborative Index Embedding for Image Retrieval

Codedretrieval: Joint Image Compression and Retrieval with Neural Networks.