Perceptual Robust Hashing for Video Copy Detection with Unsupervised Learning.

Gejian Zhao,Chuan Qin,Xiangyang Luo,Xinpeng Zhang,Chin-Chen Chang
DOI: https://doi.org/10.1145/3577163.3595110
2023-01-01
Abstract:In this paper, we propose an end-to-end perceptual robust hashing scheme for video copy detection based on unsupervised learning. Firstly, the spatio-temporal information in videos is effectively fused and condensed into high-dimensional features through a 3D selfattention, multi-scale feature fusion model based on 3D-CNN, in which the Inception block and the 3D self-attention mechanism are integrated. Then, we calculate the correlation distances between the extracted features to differentiate perceptual contents. Based on the similarity relationship, we can dynamically generate the pseudo-labels and exploit them to further guide the model training for video hash generation. In addition, we design the dual constraints to make the hash code obtain satisfactory robustness and discrimination. Extensive experiments demonstrate that the proposed scheme achieves superior performance of copy detection compared with existing schemes and performs well even in the case of untrained manipulations.
What problem does this paper attempt to address?