Abstract:The MPEG compact descriptors for visual search (CDVS) is a standard toward image matching and retrieval. To achieve high retrieval accuracy over a large scale image/video dataset, recent research efforts have demonstrated that employing extremely high-dimensional descriptors such as the Fisher vector (FV) and the vector of locally aggregated descriptors (VLAD) can yield good performance. Since the FV (or VLAD) possesses high discriminability but small visual vocabulary, it has been adopted by CDVS to construct a global compact descriptor. In this paper, we study the development of global compact descriptors in the completed CDVS standard and the emerging compact descriptors for video analysis (CDVA) standard, in which we formulate the FV (or VLAD) compression as a resource-constrained optimization problem. Accordingly, we propose a codebook-free aggregation method via dual selection to generate a global compact visual descriptor, which supports fast and accurate feature matching free of large visual codebooks, fulfilling the low memory requirement of mobile visual search at significantly reduced latency. Specifically, we investigate both sample-specific Gaussian component redundancy and bit dependency within a binary aggregated descriptor to produce compact binary codes. Our technique contributes to the scalable compressed Fisher vector (SCFV) adopted by the CDVS standard. Moreover, the SCFV descriptor is currently serving as the frame-level hand-crafted video feature, which inspires the inheritance of CDVS descriptors for the emerging CDVA standard. Furthermore, we investigate the positive complementary effect of our standard compliant compact descriptor and deep learning based features extracted from convolutional neural networks with significant mean average precision gains. Extensive evaluation over benchmark databases shows the significant merits of the codebook-free binary codes for scalable visual search.

Weighted Two-Step Aggregated VLAD for Image Retrieval

Image Representation Optimization Based on Locally Aggregated Descriptors.

Fine-residual VLAD for Image Retrieval.

A Compact Binary Aggregated Descriptor Via Dual Selection for Visual Search

Hierarchical Multi-Vlad For Image Retrieval

Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics.

A Simple but Efficient Way to Combine Vlad with Locality-Constrained Linear Coding

Multi-stage vector quantization towards low bit rate visual search

Making Residual Vector Distribution Uniform for Distinctive Image Representation

Deep Visual-Semantic Quantization For Efficient Image Retrieval

Spatial pyramid VLAD

Weighted Component Hashing of Binary Aggregated Descriptors for Fast Visual Search.

VLAD Re-Ranking: Iteratively Estimating the Probability of Relevance with Relationships Between Dataset Images

VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition

Codebook-Free Compact Descriptor for Scalable Visual Search.

Image retrieval based on aggregated deep features weighted by regional significance and channel sensitivity

Unsupervised Part-based Weighting Aggregation of Deep Convolutional Features for Image Retrieval

Topic Level Sampling Towards Optimized Locality Sensitive Vocabulary Coding

Sorting Local Descriptors for Lowbit Rate Mobile Visual Search

Democratic Diffusion Aggregation for Image Retrieval

Evaluating Inverted Files for Visual Compact Codes on a Large Scale