Enhancing Temporal Context for Learned Video Compression

Fei Xiong,Jiayu Yang,Ronggang Wang
DOI: https://doi.org/10.1145/3665026.3665027
2024-01-01
Abstract:To meet the demand of video transmission, learned video compression has drawn growing attention from research communities. In most existing works, motion estimation is performed through an optical flow prediction network. However, flow estimation naturally fails in complex cases such as occlusions, bringing in unexpected noise on warped temporal features. In order to tackle this issue, our work proposes a mask-based temporal context enhancement method. By explicitly analyzing the correlation of warped frame and original frame, a mask that reflects the credibility of flow warping is generated at encoder side and afterwards transmitted along with the motion information. By applying the mask on temporal features, we filter the noise and generate multi-scale temporal contexts with clearer semantics. Experimental results demonstrate that our model enjoys rate-distortion performance gains in terms of both PSNR and MS-SSIM over baseline contextual coding model.
What problem does this paper attempt to address?