STRANet: Soft-Target and Restriction-Aware Neural Network for Efficient VVC Intra Coding

Tianyi Sun,Yanze Wang,Zhijie Huang,Jun Sun
DOI: https://doi.org/10.1109/tcsvt.2024.3428474
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The Versatile Video Coding (VVC) standard introduces a quad-tree with a nested multi-type tree (QTMT) partition structure to improve the rate-distortion (RD) performance, but this leads to a substantial increase in encoding complexity. Previous studies have labeled partition modes of CUs using hard targets (i.e., one-hot labels) generated by VVC reference software (VTM), which is challenging for neural networks to predict accurately. Furthermore, in earlier works, the VVC restrictions are not incorporated into convolutional neural network (CNN), not fully exploiting the predicting capacity of CNN. In this paper, we propose a novel soft-target and restriction-aware neural network (STRANet) to address these issues. Firstly, inspired by the observation that a CU may split differently under various circumstances, we collect these RD costs and precisely estimate the probability of each partition mode to generate a soft target. Secondly, our neural network incorporates QP and restriction type through attention modules so as to output predictions that are standard-compliant with simple post-processing. Thirdly, Window Attention Module, a combination of CNN and attention mechanism, is adopted to further enhance performance on GPU. Through the application of these methods, STRANet reduces encoding time by 51.84% and 61.00% with 0.44% and 0.84% Bjøntegaard delta bit-rate (BD-BR) increase, superior to state-of-the-art methods. The code has been released at https://github.com/cppppp/STRANet.
What problem does this paper attempt to address?