Deep Video Compression with Scaled Hierarchical Bi-directional Motion Model

Feng Ye,Li Zhang,Chuanmin Jia
DOI: https://doi.org/10.1145/3664647.3685524
2024-01-01
Abstract:Deep video compression has attracted increasing attention in recent years due to its end-to-end optimization ability. However, most existing neural video compression (NVC) models focus on incorporating sophisticated motion or residual coding networks for successive frames leveraging spatial-temporal redundancy removal, neglecting the efficient motion representation and essential structure of scaled prediction for motion dynamics. To resolve this problem, this paper proposed a novel model, named scaled hierarchical bi-directional prediction structure, which effectively captures temporal correlation among frames considering the quality variation when managing the reference frames. This paper first introduces parameter-shared motion codecs and efficient information fusion strategies to obtain predictive features more precisely. Subsequently, scaled motions from temporal contexts are learned as bi-directional prior for motion representation. Additionally, the concept of trustworthy motion modeling is proposed to represent the effectiveness of reference information, measuring the reliability of predictive accuracy in complex motions, camera rotations and occlusions. Extensive experimental results demonstrate that our approach offers significant advantages over state-of-the-art bi-directional NVC models in coding efficiency. The proposed method has been adopted as the latest reference model by Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) end-to-end video coding (EEV) standard. The code is available at: https://github.com/yefeng00/DVC_with_Scaled_Hierarchical_Bi_directional_Motion_Model.
What problem does this paper attempt to address?