Abstract:Recently, deep learning-based image compression has shown significant performance improvement in terms of coding efficiency and subjective quality. However, there has been relatively less effort on video compression based on deep neural networks. In this paper, we propose an end-to-end deep predictive video compression network, called DeepPVCnet, using mode-selective uni- and bi-directional predictions based on multi-frame hypothesis with a multi-scale structure and a temporal-context-adaptive entropy model. Our DeepPVCnet jointly compresses motion information and residual data that are generated from the multi-scale structure via the feature transformation layers. Recent deep learning-based video compression methods were proposed in a limited compression environment using only P-frame or B-frame. Learned from the lesson of the conventional video codecs, we firstly incorporate a mode-selective framework into our DeepPVCnet with uni- and bi-directional predictive modes in a rate-distortion minimization sense. Also, we propose a temporal-context-adaptive entropy model that utilizes the temporal context information of the reference frames for the current frame coding. The autoregressive entropy models for CNN-based image and video compression is difficult to compute with parallel processing. On the other hand, our temporal-context-adaptive entropy model utilizes temporally coherent context from the reference frames, so that the context information can be computed in parallel, which is computationally and architecturally advantageous. Extensive experiments show that our DeepPVCnet outperforms AVC/H.264, HEVC/H.265 and state-of-the-art methods in an MS-SSIM perspective.

Temporal Context Mining for Learned Video Compression

Learned Video Compression With Efficient Temporal Context Learning

Temporal context video compression with flow-guided feature prediction

Enhancing Temporal Context for Learned Video Compression

Long-term Temporal Context Gathering for Neural Video Compression

Foreground-Background Parallel Compression with Residual Encoding for Surveillance Video

Exploring Long- and Short-Range Temporal Information for Learned Video Compression

Learned Video Compression Via Joint Spatial-Temporal Correlation Exploration

Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression

ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression

Bi-Directional Deep Contextual Video Compression

Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Neural video compression using patio-temporal priors

Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression

Neural Video Compression using Spatio-Temporal Priors

Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi-Frame Hypothesis

A channel-wise contextual module for learned intra video compression

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

Learning-Based End-to-End Video Compression with Spatial-Temporal Adaptation.

High Efficiency Deep-learning Based Video Compression