Temporal Context Mining for Learned Video Compression

Xihua Sheng,Jiahao Li,Bin Li,Li,Dong Liu,Yan Lu
DOI: https://doi.org/10.1109/tmm.2022.3220421
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:Applying deep learning to video compression has attracted increasing attention in recent few years. In this work, we address end-to-end learned video compression with a special focus on better learning and utilizing temporal contexts. We propose to propagate not only the last reconstructed frame but also the feature before obtaining the reconstructed frame for temporal context mining. From the propagated feature, we learn multi-scale temporal contexts and re-fill the learned temporal contexts into the modules of our compression scheme, including the contextual encoder-decoder, the frame generator, and the temporal context encoder. We discard the parallelization-unfriendly auto-regressive entropy model to pursue a more practical encoding and decoding time. Experimental results show that our proposed scheme achieves a higher compression ratio than the existing learned video codecs. Our scheme also outperforms x264 and x265 (representing industrial software for H.264 and H.265, respectively) as well as the official reference software for H.264, H.265, and H.266 (JM, HM, and VTM, respectively). Specifically, when intra period is 32 and oriented to PSNR, our scheme outperforms H.265-HM by 14.4% bit rate saving; when oriented to MS-SSIM, our scheme outperforms H.266-VTM by 21.1% bit rate saving.
What problem does this paper attempt to address?