Motion-Adaptive Inference for Flexible Learned B-Frame Compression

M. Akin Yilmaz,O. Ugur Ulas,Ahmet Bilican,A. Murat Tekalp
2024-02-13
Abstract:While the performance of recent learned intra and sequential video compression models exceed that of respective traditional codecs, the performance of learned B-frame compression models generally lag behind traditional B-frame coding. The performance gap is bigger for complex scenes with large motions. This is related to the fact that the distance between the past and future references vary in hierarchical B-frame compression depending on the level of hierarchy, which causes motion range to vary. The inability of a single B-frame compression model to adapt to various motion ranges causes loss of performance. As a remedy, we propose controlling the motion range for flow prediction during inference (to approximately match the range of motions in the training data) by downsampling video frames adaptively according to amount of motion and level of hierarchy in order to compress all B-frames using a single flexible-rate model. We present state-of-the-art BD rate results to demonstrate the superiority of our proposed single-model motion-adaptive inference approach to all existing learned B-frame compression models.
Image and Video Processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the poor performance of the current learning - based bidirectional (B - frame) video compression models when dealing with complex scenes and large motion ranges. Specifically, the paper focuses on how to improve the performance of B - frame compression models through improved flow prediction, especially in cases where the distance change between reference frames at different temporal hierarchies leads to inconsistent motion ranges. Traditional single B - frame compression models have difficulty adapting to various motion ranges, which can lead to performance degradation. To solve this problem, the authors propose a new method - motion - adaptive inference. By adaptively down - sampling video frames during the inference process according to the amount of motion and the hierarchy level, all B - frames can be compressed using a single flexible - rate model. This method aims to make the motion range close to that in the training data, thereby improving the accuracy of flow prediction, reducing the data drift phenomenon, and ultimately enhancing the compression performance. The main contributions of the paper include: 1. **Adaptive motion flow prediction**: To deal with data drift caused by different motion ranges, an adaptive motion flow prediction method is proposed. By adaptively selecting the resolution scale at which flow prediction is performed, the range of motion vectors between past and future reference frames can be controlled, so that the flow vector distribution during the inference process matches the distribution learned during the training process, thereby improving the accuracy of flow prediction. 2. **Flow - guided multi - scale offset estimation**: In the new application scenario, that is, in the current - frame feature prediction based on deformable convolution, the predicted flow is used instead of the actual flow to guide multi - scale offset estimation. This method can not only handle geometric transformations but also enhance the stability of the model. 3. **Multi - scale context encoding**: Through multi - scale conditional encoding technology, the representation efficiency of video frames is improved, the quality of the reconstructed video is enhanced, and the bit - rate allocation is optimized, which can more finely identify the importance of each region within the frame. These innovations together form an efficient bidirectional video compression framework, which shows better performance than the existing best learning - based B - frame compression models on the UVG test set, especially when dealing with sequences with fast and complex motions.