Short Video Fingerprint Extraction: from Audio–visual Fingerprint Fusion to Multi-Index Hashing

Shuying Zhang,Jing Zhang,Yizhou Wang,Li Zhuo
DOI: https://doi.org/10.1007/s00530-022-01031-4
IF: 3.9
2022-01-01
Multimedia Systems
Abstract:As one of the most prevalent we-media, short video has exponentially grown and gradually fallen into the disaster area of infringement. Video fingerprint extraction technology is conducive to the intelligent identification of short video. In view of various tampering attacks, a short video fingerprint extraction method from audio–visual fingerprint fusion to multi-index hashing is proposed, including: (1) the shot-level fingerprint of short video is extracted by audio–visual fingerprint fusion after analyzing the consistency to eliminate the uncertainty at the decision-making layer, in which the visual fingerprint is generated by R(2 + 1)D network, and the audio fingerprint is combined by extracting audio features with masked audio spectral keypoints (MASK) and convolutional recurrent neural network (CRNN); (2) the shot-level fingerprints are assembled into the data-level fingerprint of short video by constructing the data-shot-key frame relationship model of data structure; (3) the short video fingerprint is matched by measuring the weighted Hamming distance by creating the multi-index hashing of the data-level fingerprint. Five experiments are conducted on the CC_Web_Video dataset and the Moments_in_Time_Raw_v2 dataset, and the results show that our method can effectively raise the overall performance of short video fingerprint.
What problem does this paper attempt to address?