An Interpolation-Free Fractional Motion Estimation Algorithm and Hardware Implementation for VVC
Shushi Chen,Leilei Huang,Zhao Zan,Xiaoyang Zeng,Yibo Fan
DOI: https://doi.org/10.1109/tvlsi.2024.3455374
2024-01-01
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Abstract:Versatile video coding (VVC) introduces multi-type tree (MTT) and larger coding tree unit (CTU) to improve compression efficiency compared to its predecessor High Efficiency Video Coding (HEVC). This leads to higher throughput for fractional motion estimation (FME) to meet the needs of real-time processing. In this context, this article proposes an interpolation-free algorithm based on an error surface to improve the throughput of FME hardware. The error surface is constructed by the rate-distortion costs (RDCs) of the integer motion vector (IMV) and its neighbors. To improve the prediction accuracy, a hardware-friendly RDC estimation strategy is proposed to construct the error surface. The experimental results show that the corresponding Bjontegaard Delta Bit Rate (BDBR) in Random Access (RA), Low Delay P (LDP) and Low Delay B (LDB) configuration increases by only 0.358%, 0.479%, and 0.511% compared with the VVC test model (VTM) 16.0. Compared with the default FME algorithms of VVC, the time cost of FME is reduced by 53.47%, 56.28%, and 54.23%, respectively, in RA, LDP, and LDB configurations. The algorithm is free of iteration and interpolation, which can contribute to low-cost and high-throughput hardware. The proposed architecture can support FME of all coding units (CUs) in a CTU with one layer of MTT under the quaternary tree (QT), and the CU size can vary from 8 x 8 to 128 x 128. Synthesized using GF 28-nm process, the architecture can achieve 7680 x 4320@60 fps throughput at 800 MHz, with a gate count of 244 K and power consumption of 76.5 mW. This proposed architecture can meet the real-time coding requirements of VVC.