Abstract:Bag-of-visual Words (BoW) image representation has been illustrated as one of the most promising solutions for large-scale near-duplicated image retrieval. However, the traditional visual vocabulary is created in an unsupervised way by clustering a large number of image local features. This is not ideal because it largely ignores the semantic and spatial contexts between local features. In this paper, we propose the geometric visual vocabulary which captures the spatial contexts by quantizing local features in bi-space, i.e., in descriptor space and orientation space. Then, we propose to capture the semantic context by learning a semantic-aware distance metric between local features, which could reasonably measure the semantic similarities between image patches, from which the local features are extracted. The learned distance is hence utilized to cluster the local features for semantic visual vocabulary generation. Finally, we combine the spatial and semantic contexts in a unified framework by extracting local feature groups, computing the spatial configurations between the local features inside the group, and learning a semantic-aware distance between groups. The learned group distance is then utilized to cluster the extracted local feature groups to generate a novel visual vocabulary, i.e., the contextual visual vocabulary. The proposed visual vocabularies, i.e., geometric visual vocabulary, semantic visual vocabulary and contextual visual vocabulary are tested in large-scale near-duplicated image retrieval applications. The geometric visual vocabulary and semantic visual vocabulary achieve better performance than the traditional visual vocabulary. Moreover, the contextual visual vocabulary, which combines both spatial and semantic clues outperforms the state-of-the-art bundled feature in both retrieval precision and efficiency.

Latent Dirichlet Allocation Based Image Retrieval

Image retrieval based on incremental subspace learning

Probabilistic Latent Semantic Analysis for Sketch-Based 3D Model Retrieval

Domain-Specific Modeling and Semantic Alignment for Image-Based 3d Model Retrieval

Allocating images and selecting image collections for distributed visual search

A Combination of Bag of Categorized Visual Words and Tag Voting Based Image Retrieval

LDA-Based Retrieval Framework for Semantic News Video Retrieval

A Study of Language Model for Image Retrieval

Topic Level Sampling Towards Optimized Locality Sensitive Vocabulary Coding

Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Bag-Of-Words Based Deep Neural Network For Image Retrieval

Modeling Image Data for Effective Indexing and Retrieval in Large General Image Databases.

Sketch-Based Image Retrieval with a Novel BoVW Representation.

Latent Visual Context Learning for Web Image Applications

Multi-modal Auto-regressive Modeling via Visual Words

Regularized Semi-Supervised Latent Dirichlet Allocation for Visual Concept Learning

VLAD Re-Ranking: Iteratively Estimating the Probability of Relevance with Relationships Between Dataset Images

Visual language modeling for image classification.

A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval

Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

Scale-invariant visual language modeling for object categorization