CNN-based ternary tree partition approach for VVC intra-QTMT coding

Fatma Belghith,Bouthaina Abdallah,Sonda Ben Jdidia,Mohamed Ali Ben Ayed,Nouri Masmoudi
DOI: https://doi.org/10.1007/s11760-024-03023-5
IF: 1.583
2024-02-17
Signal Image and Video Processing
Abstract:In July 2020, the Joint Video Experts Team has published the versatile video coding (VVC) standard. The VVC encoder enhances the coding efficiency compared with his predecessor high-efficiency video coding encoder, thanks to the improved coding modules and the new proposed techniques such as the new block partitioning structure called quadtree with nested multi-type tree (QTMT). However, QTMT induces a significant increase in encoding time mainly at the rate distortion optimization level (RDO) which causes an enormous computational complexity. Instead of RDO-QTMT partition process, a deep-QTMT partition approach based on a fast convolution neural network-ternary tree (CNN-TT) is proposed to predict the best intra-QTMT decision tree in order to reduce the encoding time. A database is initially established containing CU-based TT partition depths with several video contents. Then, a CNN-TT model is developed under three-levels provided by the TT structure to early determine the QTMT partition at 32 32. Different threshold values are fixed for each level according to the CNN-TT predicted probabilities to reach a balance between the encoding complexity and the coding efficiency. The experimental results prove that our deep-QTMT partition approach saves a significant encoder time on average between 23% and 58% with an acceptable RD performance.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of significantly increased encoding time and computational complexity in the VVC (Versatile Video Coding) standard due to the adoption of the new block partitioning structure (QTMT, i.e., Quadtree and Nested Multi - type Tree). Specifically: 1. **Challenges brought by the improvement of VVC encoding efficiency**: - The VVC encoder has a significant improvement in encoding efficiency compared to its predecessor, the HEVC (High - Efficiency Video Coding) encoder. This is mainly due to the improved encoding modules and new technologies, such as the new block partitioning structure QTMT. - However, the QTMT structure leads to a huge computational complexity at the rate - distortion optimization (RDO) level, thus significantly increasing the encoding time. 2. **Limitations of existing methods**: - Although there are various machine - learning - and deep - learning - based methods to accelerate the CU (Coding Unit) partitioning decision, these methods have not been fully studied when dealing with the TT (Ternary Tree) structure. 3. **The proposed new method**: - The paper proposes a fast QTMT partitioning method based on the convolutional neural network (CNN), called CNN - TT (Convolutional Neural Network Ternary Tree), to predict the optimal intra - QTMT decision tree, thereby reducing the encoding time. - By establishing a database of CU - based TT partitioning depths for different video contents and developing a three - layer CNN - TT model to determine QTMT partitioning in advance, this method can significantly reduce the encoding time while maintaining acceptable rate - distortion performance. ### Main contributions - **Database generation**: A large database containing the TT partitioning depths of 32 × 32 CUs was constructed. - **CNN - TT model development**: A CNN - TT classifier based on the TT structure was developed and trained to predict the TT segmentation of 32 × 32 CUs. - **Model application**: The trained CNN - TT model was applied to the VVC encoder to predict the QTMT decision tree. ### Experimental results The experimental results show that this method can save an average of 23% to 58% of the encoding time while maintaining acceptable rate - distortion performance. This proves the effectiveness of this method in reducing encoding complexity. In summary, this paper effectively solves the problems of high computational complexity and long encoding time in VVC encoding caused by the QTMT structure by introducing the CNN - TT model, providing a new solution for improving video encoding efficiency.