Abstract:Recently, with the popularization of camera tools such as mobile phones and the rise of various short video platforms, a lot of videos are being uploaded to the Internet at all times, for which a video retrieval system with fast retrieval speed and high precision is very necessary. Therefore, content-based video retrieval (CBVR) has aroused the interest of many researchers. A typical CBVR system mainly contains the following two essential parts: video feature extraction and similarity comparison. Feature extraction of video is very challenging, previous video retrieval methods are mostly based on extracting features from single video frames, while resulting the loss of temporal information in the videos. Hashing methods are extensively used in multimedia information retrieval due to its retrieval efficiency, but most of them are currently only applied to image retrieval. In order to solve these problems in video retrieval, we build an end-to-end framework called deep supervised video hashing (DSVH), which employs a 3D convolutional neural network (CNN) to obtain spatial-temporal features of videos, then train a set of hash functions by supervised hashing to transfer the video features into binary space and get the compact binary codes of videos. Finally, we use triplet loss for network training. We conduct a lot of experiments on three public video datasets UCF-101, JHMDB and HMDB-51, and the results show that the proposed method has advantages over many state-of-the-art video retrieval methods. Compared with the DVH method, the mAP value of UCF-101 dataset is improved by 9.3%, and the minimum improvement on JHMDB dataset is also increased by 0.3%. At the same time, we also demonstrate the stability of the algorithm in the HMDB-51 dataset.

Unsupervised Variational Video Hashing with 1D-CNN-LSTM Networks.

Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data

Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature

Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval

A Supervised Video Hashing Method Based on a Deep 3D Convolutional Neural Network for Large-Scale Video Retrieval

Unsupervised Deep Video Hashing with Balanced Rotation.

Unsupervised Teacher-Student Model for Large-Scale Video Retrieval.

Self-Supervised Video Hashing Via Bidirectional Transformers.

Efficient Unsupervised Video Hashing with Contextual Modeling and Structural Controlling

Heterogeneous Hashing Network for Face Retrieval Across Image and Video Domains

Neighborhood Preserving Hashing for Scalable Video Retrieval.

Unsupervised Video Hashing with Multi-granularity Contextualization and Multi-structure Preservation

Joint Multi-View Hashing for Large-Scale Near-Duplicate Video Retrieval

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval.

Scalable Multimedia Retrieval By Deep Learning Hashing With Relative Similarity Learning

CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing

Deep Variational and Structural Hashing.

Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval

Multiple Feature Hashing for Real-Time Large Scale Near-Duplicate Video Retrieval

Video Moment Localization via Deep Cross-Modal Hashing

Contrastive Transformer Hashing for Compact Video Representation