Abstract:Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset.

Attention-Based Video Hashing for Large-Scale Video Retrieval

Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval

Large-scale Image Retrieval Based on Boosting Iterative Quantization Hashing with Query-Adaptive Reranking.

A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval

Heterogeneous Hashing Network for Face Retrieval Across Image and Video Domains

Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval

Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval

Multiple Feature Hashing for Real-Time Large Scale Near-Duplicate Video Retrieval

Joint Multi-View Hashing for Large-Scale Near-Duplicate Video Retrieval

Unsupervised Teacher-Student Model for Large-Scale Video Retrieval.

Neighborhood Preserving Hashing for Scalable Video Retrieval.

Scalable Multimedia Retrieval By Deep Learning Hashing With Relative Similarity Learning

Unsupervised Variational Video Hashing with 1D-CNN-LSTM Networks.

Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature

Large-Scale Video Hashing Via Structure Learning

Discriminative Codebook Hashing for Supervised Video Retrieval

Unsupervised Deep Video Hashing with Balanced Rotation.

Self-Supervised Video Hashing Via Bidirectional Transformers.

Efficient Unsupervised Video Hashing with Contextual Modeling and Structural Controlling

Encode the Unseen: Predictive Video Hashing for Scalable Mid-Stream Retrieval

Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation