Advanced Learning-Based Inter Prediction for Future Video Coding

Yanchen Zhao,Wenhong Duan,Chuanmin Jia,Shanshe Wang,Siwei Ma
2024-11-24
Abstract:In the fourth generation Audio Video coding Standard (AVS4), the Inter Prediction Filter (INTERPF) reduces discontinuities between prediction and adjacent reconstructed pixels in inter prediction. The paper proposes a low complexity learning-based inter prediction (LLIP) method to replace the traditional INTERPF. LLIP enhances the filtering process by leveraging a lightweight neural network model, where parameters can be exported for efficient inference. Specifically, we extract pixels and coordinates utilized by the traditional INTERPF to form the training dataset. Subsequently, we export the weights and biases of the trained neural network model and implement the inference process without any third-party dependency, enabling seamless integration into video codec without relying on Libtorch, thus achieving faster inference speed. Ultimately, we replace the traditional handcraft filtering parameters in INTERPF with the learned optimal filtering parameters. This practical solution makes the combination of deep learning encoding tools with traditional video encoding schemes more efficient. Experimental results show that our approach achieves 0.01%, 0.31%, and 0.25% coding gain for the Y, U, and V components under the random access (RA) configuration on average.
Multimedia,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the fourth - generation audio - video coding standard (AVS4), the traditionally hand - designed inter - frame prediction filter (INTERPF) based on statistical priors cannot efficiently compress ultra - high - definition videos. Although the traditional INTERPF can reduce the discontinuity between the prediction block and the adjacent reconstructed pixels, its performance improvement is limited, especially when dealing with different types of video content, because hand - designed algorithms are difficult to adapt to diverse video data. To overcome this limitation, the paper proposes a low - complexity learning - based inter - frame prediction method (LLIP) to optimize the traditional INTERPF through a lightweight neural network model. Specifically, the LLIP method extracts training data from the pixel values and coordinates used by the traditional INTERPF, trains a lightweight fully - connected neural network, and exports the trained model parameters for an efficient inference process. This method not only maintains a complexity comparable to that of traditional coding tools but also significantly improves the coding efficiency. In particular, under the random access (RA) configuration, the average coding gains for the Y, U, and V components are 0.01%, 0.31%, and 0.25% respectively. The main contributions of the paper include: 1. Proposing a lightweight fully - connected network that can replace the traditional INTERPF to achieve significant coding gains while maintaining a complexity comparable to that of traditional coding tools. 2. Implementing a model - inference library without third - party dependencies, which helps to combine neural - network - based coding tools with traditional codecs and has a faster inference speed compared to using the Libtorch library. 3. Proposing an effective method to optimize traditional coding tools using neural networks, providing a new direction for exploring next - generation video coding tools. Through these improvements, the paper provides a practical solution for improving the efficiency and quality of video coding, especially when dealing with ultra - high - definition videos.