Deep Multi-Task Learning Based Fast Intra-Mode Decision for Versatile Video Coding

Zheng Liu,Tianyi Li,Ying Chen,Kaijin Wei,Mai Xu,Honggang Qi
DOI: https://doi.org/10.1109/tcsvt.2023.3262733
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The latest Versatile Video Coding (VVC) standard has significantly coding efficiency improvement compared with its ancestor High Efficiency Video Coding (HEVC) standard, but at the expense of over-high complexity. As measured by the VVC test model (VTM), the intra-mode comparison and selection in the rate-distortion optimization (RDO) search consume most of the encoding time. In this paper, we propose a deep multi-task learning based fast intra-mode decision approach via adaptively pruning off most redundant modes. First, we create a large-scale intra-mode database for VVC, including both normal angular modes and the newly introduced tools, i.e., intra sub-partition (ISP) and matrix-based intra prediction (MIP). Next, we propose a multi-task intra-mode decision network (MID-Net) model to effectively predict the most probable angular modes and whether to skip ISP and MIP modes. Then, a fast intra-coding workflow is designed accordingly, involving rough mode decision (RMD) acceleration and candidate mode list (CML) pruning. For the workflow output, the learning-oriented probability and the statistics-oriented probability are synthesized together to further improve the prediction accuracy, ensuring that only unnecessary intra-modes are skipped. Finally, experimental results show that our approach can significantly reduce 40.48% of encoding time of VVC intra-coding with negligible rate-distortion degradation, outperforming other state-of-the-art approaches.
What problem does this paper attempt to address?