Generative Memorize-Then-Recall framework for low bit-rate Surveillance Video Compression

Yaojun Wu,Tianyu He,Zhibo Chen
DOI: https://doi.org/10.48550/arXiv.1912.12847
2020-05-06
Abstract:Applications of surveillance video have developed rapidly in recent years to protect public safety and daily life, which often detect and recognize objects in video sequences. Traditional coding frameworks remove temporal redundancy in surveillance video by block-wise motion compensation, lacking the extraction and utilization of inherent structure information. In this paper, we figure out this issue by disentangling surveillance video into the structure of a global spatio-temporal feature (memory) for Group of Picture (GoP) and skeleton for each frame (clue). The memory is obtained by sequentially feeding frame inside GoP into a recurrent neural network, describing appearance for objects that appeared inside GoP. While the skeleton is calculated by a pose estimator, it is regarded as a clue to recall memory. Furthermore, an attention mechanism is introduced to obtain the relation between appearance and skeletons. Finally, we employ generative adversarial network to reconstruct each frame. Experimental results indicate that our method effectively generates realistic reconstruction based on appearance and skeleton, which show much higher compression performance on surveillance video compared with the latest video compression standard H.265.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?