Hierarchical B-frame Video Coding for Long Group of Pictures

Ivan Kirillov,Denis Parkhomenko,Kirill Chernyshev,Alexander Pletnev,Yibo Shi,Kai Lin,Dmitry Babin
2024-06-24
Abstract:Learned video compression methods already outperform VVC in the low-delay (LD) case, but the random-access (RA) scenario remains challenging. Most works on learned RA video compression either use HEVC as an anchor or compare it to VVC in specific test conditions, using RGB-PSNR metric instead of Y-PSNR and avoiding comprehensive evaluation. Here, we present an end-to-end learned video codec for random access that combines training on long sequences of frames, rate allocation designed for hierarchical coding and content adaptation on inference. We show that under common test conditions (JVET-CTC), it achieves results comparable to VTM (VVC reference software) in terms of YUV-PSNR BD-Rate on some classes of videos, and outperforms it on almost all test sets in terms of VMAF BD-Rate. On average it surpasses open LD and RA end-to-end solutions in terms of VMAF and YUV BD-Rates.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the video compression performance issue in the Random Access (RA) scenario. Specifically, although the existing learning - based video compression methods have surpassed VVC (Versatile Video Coding) in the Low - Delay (LD) scenario, they still face challenges in the random access scenario. Most of the work on learning - based RA video compression either uses HEVC as a benchmark or compares with VVC under specific test conditions, usually using the RGB - PSNR metric instead of Y - PSNR, and lacks comprehensive evaluation. To solve these problems, the authors propose an end - to - end learned video codec for the random access scenario. This codec combines training of long - sequence frames, bit - rate allocation designed for hierarchical coding, and content - adaptation at inference time. Through these improvements, the model achieves YUV - PSNR BD - Rate results comparable to the VVC reference software VTM under common test conditions (JVET - CTC), and outperforms the VMAF BD - Rate performance of VTM on almost all test sets. ### Main Contributions 1. **Propose an end - to - end random access video codec**: This codec performs well in the random access scenario, especially significantly outperforming traditional codecs in terms of the VMAF BD - Rate metric. 2. **Introduce new training methods**: Including using longer training sequences, special data sampling techniques, and loss functions, thus achieving significant performance improvements in BD - rate. 3. **Improve the B - frame model architecture**: Introduce the Hierarchical Gain Unit (HGU) module to adapt to different levels of coding structures and improve the generalization ability of the model. 4. **Develop content - adaptation techniques for random access**: Allow the model to adjust dynamically according to the input content, further improving the compression performance. ### Key Problems Solved - **High motion variability**: Through the improved B - frame model and training strategy, the motion variability between the target frame and the reference frame is reduced. - **Complex data distribution**: Through data - adaptation techniques, the model can better handle complex input data distributions. - **Lack of comprehensive evaluation**: Through comprehensive evaluation under common test conditions, the reliability and comparability of the results are ensured. In conclusion, through a series of innovative technical means, this paper successfully solves multiple key problems in video compression in the random access scenario and shows its potential in practical applications.