Abstract:The MPEG compact descriptors for visual search (CDVS) is a standard toward image matching and retrieval. To achieve high retrieval accuracy over a large scale image/video dataset, recent research efforts have demonstrated that employing extremely high-dimensional descriptors such as the Fisher vector (FV) and the vector of locally aggregated descriptors (VLAD) can yield good performance. Since the FV (or VLAD) possesses high discriminability but small visual vocabulary, it has been adopted by CDVS to construct a global compact descriptor. In this paper, we study the development of global compact descriptors in the completed CDVS standard and the emerging compact descriptors for video analysis (CDVA) standard, in which we formulate the FV (or VLAD) compression as a resource-constrained optimization problem. Accordingly, we propose a codebook-free aggregation method via dual selection to generate a global compact visual descriptor, which supports fast and accurate feature matching free of large visual codebooks, fulfilling the low memory requirement of mobile visual search at significantly reduced latency. Specifically, we investigate both sample-specific Gaussian component redundancy and bit dependency within a binary aggregated descriptor to produce compact binary codes. Our technique contributes to the scalable compressed Fisher vector (SCFV) adopted by the CDVS standard. Moreover, the SCFV descriptor is currently serving as the frame-level hand-crafted video feature, which inspires the inheritance of CDVS descriptors for the emerging CDVA standard. Furthermore, we investigate the positive complementary effect of our standard compliant compact descriptor and deep learning based features extracted from convolutional neural networks with significant mean average precision gains. Extensive evaluation over benchmark databases shows the significant merits of the codebook-free binary codes for scalable visual search.

Compact Deep Invariant Descriptors for Video Retrieval

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval.

Feature Based Inter Prediction Optimization for Non-Translational Video Coding in Cloud

Compact Descriptors for Video Analysis: the Emerging MPEG Standard

Codebook-Free Compact Descriptor for Scalable Visual Search.

An Efficient Coding Framework For Compact Descriptors Extracted From Video Sequence

Group Invariant Deep Representations for Image Instance Retrieval

Joint Coding of Local and Global Deep Features in Videos for Visual Search

Spatial Information Refinement for Chroma Intra Prediction in Video Coding

CDbin: Compact Discriminative Binary Descriptor Learned With Efficient Neural Network

Large-Scale Video Retrieval Via Deep Local Convolutional Features.

Rate-Performance-Loss Optimization for Inter-Frame Deep Feature Coding from Videos

Kernelized Subspace Pooling for Deep Local Descriptors.

An Efficient Hierarchical Near-Duplicate Video Detection Algorithm Based On Deep Semantic Features

Compact Descriptors for Mobile Visual Search and MPEG CDVS Standardization

A Rotation Invariant Descriptor for Robust Video Copy Detection

Object Level Deep Feature Pooling for Compact Image Representation

Compact CNN Based Video Representation for Efficient Video Copy Detection.

Deep Regional Feature Pooling for Video Matching

Overfitting the Data: Compact Neural Video Delivery Via Content-aware Feature Modulation

A Discriminative CNN Video Representation for Event Detection.