Affine Motion Estimation Hardware Implementation with 51.7% / 67.5% Internal Bandwidth Reduction for Versatile Video Coding
Shushi Chen,Leilei Huang,Zhao Zan,Zhijian Hao,Hao Zhang,Xiaoxiang Chen,Minge Jing,Xiaoyang Zeng,Yibo Fan
DOI: https://doi.org/10.1109/tcsvt.2024.3507375
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Versatile Video Coding (VVC) employs Affine Motion Compensation (AMC) to process scenes with high-order motion. To improve AMC efficiency, the Affine Motion Estimation (AME) process based on the gradient-based iterative algorithm (GIA) and block match algorithm (BMA) is introduced to the VVC Test Model (VTM). However, the AME process is highly complex and difficult for hardware implementation in real-time applications. In this context, this paper proposes a hardware-friendly AME algorithm and implements the corresponding accelerator. Firstly, the weighted least squares regression is used to reduce the iteration of GIA. Then an iteration-free search scheme is proposed to remove the search dependence during the GIA and BMA process. In addition, a motion vector clamping mechanism and four-level memory organization are proposed to solve the problem of reference pixel reading conflict, which reduces 51.7% and 67.5% internal bandwidth of the AME accelerator. Compared with the default AME process of VTM 16.0, experimental results show that the proposed algorithm reduces AME run time by 81.63% while the corresponding Bjontegaard Delta Bit Rate (BDBR) loss is only 0.492%. The proposed AME accelerator can flexibly support AME search tasks in various configurations. Synthesized with the TSMC 28nm process, the proposed architecture has a gate count of 1313K and a power consumption of 156.83 mW. It can achieve 7680×4320@1.7fps~30fps and the corresponding BDBR loss is 0.492%~1.835%.