Rate-Performance-Loss Optimization for Inter-Frame Deep Feature Coding from Videos
Lin Ding,Yonghong Tian,Hongfei Fan,Yaowei Wang,Tiejun Huang
DOI: https://doi.org/10.1109/tip.2017.2745203
IF: 10.6
2017-01-01
IEEE Transactions on Image Processing
Abstract:With the explosion in the use of cameras in mobile phones or video surveillance systems, it is impossible to transmit a large amount of videos captured from a wide area into a cloud for big data analysis and retrieval. Instead, a feasible solution is to extract and compress features from videos and then transmit the compact features to the cloud. Meanwhile, many recent studies also indicate that the features extracted from the deep convolutional neural networks will lead to high performance for various analysis and recognition tasks. However, how to compress video deep features meanwhile maintaining the analysis or retrieval performance still remains open. To address this problem, we propose a high-efficiency deep feature coding (DFC) framework in this paper. In the DFC framework, we define three types of features in a group-of-features (GOFs) according to their coding modes (i.e., I-feature, P-feature, and S-feature). We then design two prediction structures for these features in a GOF, including a sequential prediction structure and an adaptive prediction structure. Similar to video coding, it is important for P-feature residual coding optimization to make a tradeoff between feature bitrate and analysis/retrieval performance when encoding residuals. To do so, we propose a rate-performance-loss optimization model. To evaluate various feature coding methods for large-scale video retrieval, we construct a video feature coding data set, called VFC-1M, which consists of uncompressed videos from different scenarios captured from real-world surveillance cameras, with totally 1M visual objects. Extensive experiments show that the proposed DFC can significantly reduce the bitrate of deep features in the videos while maintaining the retrieval accuracy.