Divide-and-Conquer-Based RDO-Free CU Partitioning for 8K Video Compression
Hang Yuan,Wei Gao,Siwei Ma,Yiqiang Yan
DOI: https://doi.org/10.1145/3634705
IF: 4.094
2024-01-01
ACM Transactions on Multimedia Computing Communications and Applications
Abstract:8K (7689 x 4320) ultra-high definition (UHD) videos are growing popular with the improvement of human visual experience demand. Therefore, the compression of 8K UHD videos has become a top priority in the third-generation audio video coding standard (AVS3). However, as an important part of the coding standard promotion, the real-time hardware implementation for AVS3-based 8K UHD video intra coding is severely hindered, especially in the coding unit (CU) partition stage. To break through the limitation, this article proposes a divide-and-conquer-based rate-distortion-optimization-free (RDO-free) CU partitioning algorithm for efficient hardware implementation. Aimed at the complex CU partition in AVS3, we separately design a lightweight optimization for original partitioning rules to improve division efficiency and a decision tree-based RDO-free CU decision framework to eliminate the latency caused by the waiting for rate-distortion cost calculation in RDO strategy. Afterward, a divide-and-conquer-based hardware-friendly gradient difference calculating approach is devised to accelerate the learning feature extracting speed. To ensure that the proposed algorithm is sufficient to support the real-time CU partition for 8K videos, we also develop a hardware architecture based on FPGA. Experimental results illustrate that the software coding performance of our algorithm is significantly ahead of the efficient implementation uAVS3e for AVS3 and the reference software HM-16.20 for High Efficiency Video Coding (HEVC), even though there is 9.96% loss on BD-Rate Y. Considering its importance for the hardware implementation of AVS3-based 8K real-time encoder, the coding loss is acceptable. Moreover, the hardware simulation results on VU440 FPGA with Vivado 2019 show that our algorithm can support 61.12 frames per second (fps) CU partition for 8K UHD videos with only 0.00%, 0.00%, 1.01%, and 7.78% consumption of BRAM_18K, DSP48E, FF, and LUT, respectively. Additionally, with dual-path parallelism, 122.24 fps also can be implemented with controllable resource utilization, which achieves the state-of-the-art performance.