Abstract:In recent years, there has been an explosive increase in the amount of existing visual data. Hashing techniques have been successfully applied to deal with the large-scale nearest neighbor search problem among data on this massive scale. However, existing hashing methods usually learn a single hash code for each data point, and only by taking the content correlations among them into account. In practice, however, when handling complex visual data such as video, strong temporal relations exist among the successive frames. Moreover, if the preferred performance for large-scale video search is to be delivered, multiple hash codes are required for each data point in order to build multiple hash table indices. To address these problems, in this paper, we first study the multi-table learning problem for video search and attempt to learn binary codes by capturing the intrinsic video similarities from both the visual and the temporal aspects. By regarding the search over multiple tables as an ensemble prediction, the whole multi-table learning problem can be solved in a boosting learning manner to complementarily cover the nearest neighbors. For each table, a temporal binary coding solution is devised that thinks over the intrinsic relations among the visual content and the temporal consistency among the successive frames simultaneously. More specifically, we approximate the intrinsic visual similarities using a low-rank matrix based on sparse, non-negative feature expression. Furthermore, to essentially preserve the temporal consistency, we introduce a subspace rotation to model the variation among the successive frames. Under the boosting learning framework, the binary codes, hash functions and temporal variation of each table can be efficiently and jointly optimized. Extensive experiments on three large video datasets demonstrate that the proposed approach significantly outperforms a number of state-of-the-art hashing methods.

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval

Unsupervised Deep Video Hashing with Balanced Rotation.

Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature

Self-Supervised Video Hashing Via Bidirectional Transformers.

Unsupervised Variational Video Hashing with 1D-CNN-LSTM Networks.

CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing

A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval.

Contrastive Masked Autoencoders for Self-Supervised Video Hashing

Efficient Unsupervised Video Hashing with Contextual Modeling and Structural Controlling

Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation

Heterogeneous Hashing Network for Face Retrieval Across Image and Video Domains

Neighborhood Preserving Hashing for Scalable Video Retrieval.

Joint Multi-View Hashing for Large-Scale Near-Duplicate Video Retrieval

Large-Scale Video Hashing Via Structure Learning

Scalable Multimedia Retrieval By Deep Learning Hashing With Relative Similarity Learning

Boosting Temporal Binary Coding for Large-Scale Video Search

Discriminative Codebook Hashing for Supervised Video Retrieval

Nonlinear Structural Hashing for Scalable Video Search.

Perceptual Robust Hashing for Video Copy Detection with Unsupervised Learning.