Abstract:In the fourth generation Audio Video coding Standard (AVS4), the Inter Prediction Filter (INTERPF) reduces discontinuities between prediction and adjacent reconstructed pixels in inter prediction. The paper proposes a low complexity learning-based inter prediction (LLIP) method to replace the traditional INTERPF. LLIP enhances the filtering process by leveraging a lightweight neural network model, where parameters can be exported for efficient inference. Specifically, we extract pixels and coordinates utilized by the traditional INTERPF to form the training dataset. Subsequently, we export the weights and biases of the trained neural network model and implement the inference process without any third-party dependency, enabling seamless integration into video codec without relying on Libtorch, thus achieving faster inference speed. Ultimately, we replace the traditional handcraft filtering parameters in INTERPF with the learned optimal filtering parameters. This practical solution makes the combination of deep learning encoding tools with traditional video encoding schemes more efficient. Experimental results show that our approach achieves 0.01%, 0.31%, and 0.25% coding gain for the Y, U, and V components under the random access (RA) configuration on average.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the fourth - generation audio - video coding standard (AVS4), the traditionally hand - designed inter - frame prediction filter (INTERPF) based on statistical priors cannot efficiently compress ultra - high - definition videos. Although the traditional INTERPF can reduce the discontinuity between the prediction block and the adjacent reconstructed pixels, its performance improvement is limited, especially when dealing with different types of video content, because hand - designed algorithms are difficult to adapt to diverse video data. To overcome this limitation, the paper proposes a low - complexity learning - based inter - frame prediction method (LLIP) to optimize the traditional INTERPF through a lightweight neural network model. Specifically, the LLIP method extracts training data from the pixel values and coordinates used by the traditional INTERPF, trains a lightweight fully - connected neural network, and exports the trained model parameters for an efficient inference process. This method not only maintains a complexity comparable to that of traditional coding tools but also significantly improves the coding efficiency. In particular, under the random access (RA) configuration, the average coding gains for the Y, U, and V components are 0.01%, 0.31%, and 0.25% respectively. The main contributions of the paper include: 1. Proposing a lightweight fully - connected network that can replace the traditional INTERPF to achieve significant coding gains while maintaining a complexity comparable to that of traditional coding tools. 2. Implementing a model - inference library without third - party dependencies, which helps to combine neural - network - based coding tools with traditional codecs and has a faster inference speed compared to using the Libtorch library. 3. Proposing an effective method to optimize traditional coding tools using neural networks, providing a new direction for exploring next - generation video coding tools. Through these improvements, the paper provides a practical solution for improving the efficiency and quality of video coding, especially when dealing with ultra - high - definition videos.

Advanced Learning-Based Inter Prediction for Future Video Coding

Towards Next Generation Video Coding: from Neural Network Based Predictive Coding to In-Loop Filtering

Neural Network-Based Enhancement to Inter Prediction for Video Coding

Interweaved Prediction for Video Coding.

High-Order Intra Prediction for Future Video Coding

Multi-Scale Convolutional Neural Network-Based Intra Prediction for Video Coding.

Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

Probability-Based Fast Intra Prediction Algorithm for Spatial SHVC

Texture and Correlation Based Fast Intra Prediction Algorithm for HEVC

Efficient Gpu-Based Inter Prediction For Video Decoder

A 16-Pixel Parallel Architecture with Block-Level/mode-level Co-Reordering Approach for Intra Prediction in 4k&#x00d7;2k H.264/AVC Video Encoder

Enhanced Ctu-Level Inter Prediction with Deep Frame Rate Up-Conversion for High Efficiency Video Coding

CNN-Based Inter Prediction Refinement for AVS3

Spatial Decomposition and Temporal Fusion based Inter Prediction for Learned Video Compression

Inheritability-Inspired Intra Coding Optimization for AVS3.

Neural Network Based Inter Prediction for HEVC

Optimized Spatial Recurrent Network for Intra Prediction in Video Coding

Fast Intra Prediction Algorithm for Quality Scalable Video Coding

Enhanced line-based intra prediction with fixed interpolation filtering

Enhanced Inter Prediction with Localized Weighted Prediction in HEVC

Bi-Intra Prediction for Versatile Video Coding