Abstract:In recent years, there has been an explosive increase in the amount of existing visual data. Hashing techniques have been successfully applied to deal with the large-scale nearest neighbor search problem among data on this massive scale. However, existing hashing methods usually learn a single hash code for each data point, and only by taking the content correlations among them into account. In practice, however, when handling complex visual data such as video, strong temporal relations exist among the successive frames. Moreover, if the preferred performance for large-scale video search is to be delivered, multiple hash codes are required for each data point in order to build multiple hash table indices. To address these problems, in this paper, we first study the multi-table learning problem for video search and attempt to learn binary codes by capturing the intrinsic video similarities from both the visual and the temporal aspects. By regarding the search over multiple tables as an ensemble prediction, the whole multi-table learning problem can be solved in a boosting learning manner to complementarily cover the nearest neighbors. For each table, a temporal binary coding solution is devised that thinks over the intrinsic relations among the visual content and the temporal consistency among the successive frames simultaneously. More specifically, we approximate the intrinsic visual similarities using a low-rank matrix based on sparse, non-negative feature expression. Furthermore, to essentially preserve the temporal consistency, we introduce a subspace rotation to model the variation among the successive frames. Under the boosting learning framework, the binary codes, hash functions and temporal variation of each table can be efficiently and jointly optimized. Extensive experiments on three large video datasets demonstrate that the proposed approach significantly outperforms a number of state-of-the-art hashing methods.

Binarized Mode Seeking for Scalable Visual Pattern Discovery

Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data

Cross-Indexing of Binary Sift Codes for Large-Scale Image Search

Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

Top Rank Supervised Binary Coding For Visual Search

Coupled Binary Embedding for Large-Scale Image Retrieval

A Compact Binary Aggregated Descriptor Via Dual Selection for Visual Search

Boosting Temporal Binary Coding for Large-Scale Video Search

Sub-Selective Quantization for Large-Scale Image Search

Building Descriptive and Discriminative Visual Codebook for Large-Scale Image Applications.

Visual word expansion and BSIFT verification for large-scale image search

Binary Feature from Intensity Quantization and Weakly Spatial Contextual Coding for Image Search.

Common Visual Pattern Discovery Via Nonlinear Mean Shift Clustering

Binary Multi-View Clustering

“What-Where” sparse distributed invariant representations of visual patterns

SUBIC: A supervised, structured binary code for image search

Mining Compact Bag-of-Patterns for Low Bit Rate Mobile Visual Search

Spatial local binary patterns for scene image classification

Light-weight binary code embedding of local feature distribution in image search.

Adaptive Binary Coding for Scene Classification Based on Convolutional Networks

Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval