Abstract:Deep video compression has made remarkable process in recent years, with the majority of advancements concentrated on P-frame coding. Although efforts to enhance B-frame coding are ongoing, their compression performance is still far behind that of traditional bi-directional video codecs. In this paper, we introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B, to improve the compression performance of deep B-frame coding. Our scheme mainly has three key innovations. First, we develop a bi-directional motion difference context propagation method for effective motion difference coding, which significantly reduces the bit cost of bi-directional motions. Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model, to make better use of the multi-scale temporal contexts. Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures (GOP). Experimental results show that our DCVC-B achieves an average reduction of 26.6% in BD-Rate compared to the reference software for H.265/HEVC under random access conditions. Remarkably, it surpasses the performance of the H.266/VVC reference software on certain test datasets under the same configuration.

What problem does this paper attempt to address?

The problem this paper attempts to address is improving the compression performance of B-frames in deep learning-based video compression. Specifically, the paper proposes a bidirectional deep contextual video compression scheme (DCVC-B) aimed at enhancing the compression efficiency of B-frame encoding. Existing deep learning B-frame compression methods perform significantly worse than traditional video codecs (such as H.265/HEVC and H.266/VVC). The paper identifies three main issues with current deep learning B-frame compression methods: 1. **High compression cost of bidirectional motion vectors**: Encoding bidirectional motion vectors requires more bitrate, and existing motion differential encoding methods fail to sufficiently reduce motion redundancy. 2. **Insufficient utilization of temporal prediction**: Although some methods have applied conditional coding to leverage feature-based temporal prediction, the temporal correlation between different coding modules is still underutilized. 3. **Inefficient training strategies**: Most training strategies fail to establish an effective quality hierarchy across large GOPs, leading to unreasonable bit allocation. To address the above issues, the paper proposes three main innovations: - **Bidirectional motion differential context propagation method**: Effectively reduces the encoding cost of bidirectional motion differentials. - **Bidirectional contextual compression model and corresponding bidirectional temporal entropy model**: Better utilizes multi-scale temporal contexts. - **Training strategy based on hierarchical quality structure**: Achieves reasonable bit allocation within large GOPs. Experimental results show that, compared to the reference software of the H.265/HEVC standard, the proposed DCVC-B scheme reduces the BD-Rate by an average of 26.6% under random access configuration. On some test datasets, its performance even surpasses that of the reference software of the H.266/VVC standard.

Bi-Directional Deep Contextual Video Compression

Foreground-Background Parallel Compression with Residual Encoding for Surveillance Video

Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi-Frame Hypothesis

UCVC: A Unified Contextual Video Compression Framework with Joint P-frame and B-frame Coding

ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression

Temporal Context Mining for Learned Video Compression

IBVC: Interpolation-driven B-frame Video Compression

High Efficiency Deep-learning Based Video Compression

FVC: An End-to-End Framework Towards Deep Video Compression in Feature Space

Task-Aware Encoder Control for Deep Video Compression

DeepCoder: A Deep Neural Network Based Video Compression

Low-complexity Deep Video Compression with A Distributed Coding Architecture

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression

CNN-Based Bi-Directional Motion Compensation for High Efficiency Video Coding.

Hierarchical B-frame Video Coding for Long Group of Pictures

Deep Video Compression with Scaled Hierarchical Bi-directional Motion Model

Temporal context video compression with flow-guided feature prediction

HDVC: Deep Video Compression With Hyperprior-Based Entropy Coding

Enhanced Motion-Compensated Video Coding with Deep Virtual Reference Frame Generation