A Compression and Recognition Joint Model for Structured Video Surveillance Storage

Dongna Du,Chenghao Zhang,Yanbo Wang,Xiaoyun Kuang,Yiwei Yang,Kaitian Huang,Kejie Huang
DOI: https://doi.org/10.1145/3474198.3478250
2021-01-01
Abstract:Structured data storage of surveillance video helps to reduce the time for information retrieval. However, modern surveillance systems have to perform the compressing and recognition in two separate devices, resulting in the waste of the processing time and hardware resource. This paper proposes a joint model to enable video recognition and compression simultaneously. The proposed model stores the video in a structured format, allowing high-level vision tasks such as structured storage, content analysis, and intelligent retrieval. The large and small features are extracted at intervals in the recognition network to reduce the computation cost significantly. The experimental results show that our compression module saves 68.4% bits per pixel (bpp) compared with H.264 while achieving the same Multi-scale Structural Similarity (MS-SSIM) for 0.965. Meanwhile, our recognition module reduces 47.2% of Giga Floating-point Operations (GFLOPs) compared with ARTNet, while achieving 64.1% top-1 and 84.2% top-5 accuracy. The proposed joint model saves nearly 28% of the number of computing resources.
What problem does this paper attempt to address?