Bi-Directional Deep Contextual Video Compression

Xihua Sheng,Li Li,Dong Liu,Shiqi Wang
2024-08-16
Abstract:Deep video compression has made remarkable process in recent years, with the majority of advancements concentrated on P-frame coding. Although efforts to enhance B-frame coding are ongoing, their compression performance is still far behind that of traditional bi-directional video codecs. In this paper, we introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B, to improve the compression performance of deep B-frame coding. Our scheme mainly has three key innovations. First, we develop a bi-directional motion difference context propagation method for effective motion difference coding, which significantly reduces the bit cost of bi-directional motions. Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model, to make better use of the multi-scale temporal contexts. Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures (GOP). Experimental results show that our DCVC-B achieves an average reduction of 26.6% in BD-Rate compared to the reference software for H.265/HEVC under random access conditions. Remarkably, it surpasses the performance of the H.266/VVC reference software on certain test datasets under the same configuration.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem this paper attempts to address is improving the compression performance of B-frames in deep learning-based video compression. Specifically, the paper proposes a bidirectional deep contextual video compression scheme (DCVC-B) aimed at enhancing the compression efficiency of B-frame encoding. Existing deep learning B-frame compression methods perform significantly worse than traditional video codecs (such as H.265/HEVC and H.266/VVC). The paper identifies three main issues with current deep learning B-frame compression methods: 1. **High compression cost of bidirectional motion vectors**: Encoding bidirectional motion vectors requires more bitrate, and existing motion differential encoding methods fail to sufficiently reduce motion redundancy. 2. **Insufficient utilization of temporal prediction**: Although some methods have applied conditional coding to leverage feature-based temporal prediction, the temporal correlation between different coding modules is still underutilized. 3. **Inefficient training strategies**: Most training strategies fail to establish an effective quality hierarchy across large GOPs, leading to unreasonable bit allocation. To address the above issues, the paper proposes three main innovations: - **Bidirectional motion differential context propagation method**: Effectively reduces the encoding cost of bidirectional motion differentials. - **Bidirectional contextual compression model and corresponding bidirectional temporal entropy model**: Better utilizes multi-scale temporal contexts. - **Training strategy based on hierarchical quality structure**: Achieves reasonable bit allocation within large GOPs. Experimental results show that, compared to the reference software of the H.265/HEVC standard, the proposed DCVC-B scheme reduces the BD-Rate by an average of 26.6% under random access configuration. On some test datasets, its performance even surpasses that of the reference software of the H.266/VVC standard.