Abstract:While the performance of recent learned intra and sequential video compression models exceed that of respective traditional codecs, the performance of learned B-frame compression models generally lag behind traditional B-frame coding. The performance gap is bigger for complex scenes with large motions. This is related to the fact that the distance between the past and future references vary in hierarchical B-frame compression depending on the level of hierarchy, which causes motion range to vary. The inability of a single B-frame compression model to adapt to various motion ranges causes loss of performance. As a remedy, we propose controlling the motion range for flow prediction during inference (to approximately match the range of motions in the training data) by downsampling video frames adaptively according to amount of motion and level of hierarchy in order to compress all B-frames using a single flexible-rate model. We present state-of-the-art BD rate results to demonstrate the superiority of our proposed single-model motion-adaptive inference approach to all existing learned B-frame compression models.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the poor performance of the current learning - based bidirectional (B - frame) video compression models when dealing with complex scenes and large motion ranges. Specifically, the paper focuses on how to improve the performance of B - frame compression models through improved flow prediction, especially in cases where the distance change between reference frames at different temporal hierarchies leads to inconsistent motion ranges. Traditional single B - frame compression models have difficulty adapting to various motion ranges, which can lead to performance degradation. To solve this problem, the authors propose a new method - motion - adaptive inference. By adaptively down - sampling video frames during the inference process according to the amount of motion and the hierarchy level, all B - frames can be compressed using a single flexible - rate model. This method aims to make the motion range close to that in the training data, thereby improving the accuracy of flow prediction, reducing the data drift phenomenon, and ultimately enhancing the compression performance. The main contributions of the paper include: 1. **Adaptive motion flow prediction**: To deal with data drift caused by different motion ranges, an adaptive motion flow prediction method is proposed. By adaptively selecting the resolution scale at which flow prediction is performed, the range of motion vectors between past and future reference frames can be controlled, so that the flow vector distribution during the inference process matches the distribution learned during the training process, thereby improving the accuracy of flow prediction. 2. **Flow - guided multi - scale offset estimation**: In the new application scenario, that is, in the current - frame feature prediction based on deformable convolution, the predicted flow is used instead of the actual flow to guide multi - scale offset estimation. This method can not only handle geometric transformations but also enhance the stability of the model. 3. **Multi - scale context encoding**: Through multi - scale conditional encoding technology, the representation efficiency of video frames is improved, the quality of the reconstructed video is enhanced, and the bit - rate allocation is optimized, which can more finely identify the importance of each region within the frame. These innovations together form an efficient bidirectional video compression framework, which shows better performance than the existing best learning - based B - frame compression models on the UVG test set, especially when dealing with sequences with fast and complex motions.

Motion-Adaptive Inference for Flexible Learned B-Frame Compression

Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression

Flexible-Rate Learned Hierarchical Bi-Directional Video Compression With Motion Refinement and Frame-Level Bit Allocation

IBVC: Interpolation-driven B-frame video compression

Fast-OMRA: Fast Online Motion Resolution Adaptation for Neural B-Frame Coding

Adaptive Prediction Structure for Learned Video Compression

Deep Learned Frame Prediction for Video Compression

Fast Inter Mode Decision Using Spatial Property of Motion Field

End-to-End Rate-Distortion Optimization for Bi-Directional Learned Video Compression

Content-Adaptive Motion Rate Adaption for Learned Video Compression

Bayesian Frame Interpolation by Fusing Multiple Motion-Compensated Prediction Frames

Complexity reduction of multi-frame motion estimation in h.264

Fast Multi-frame Motion Estimation Algorithms for H.264

OMRA: Online Motion Resolution Adaptation to Remedy Domain Shift in Learned Hierarchical B-frame Coding

Hierarchical B-frame Video Coding Using Two-Layer CANF without Motion Coding

Multi-frame based fast motion estimation algorithm for H.264

Background-frame Based Motion Compensation for Video Compression

B-CANF: Adaptive B-frame Coding with Conditional Augmented Normalizing Flows

Frame Interpolation Scheme Using Inertia Motion Prediction

Boost Video Frame Interpolation via Motion Adaptation