Coarse-to-Fine Spatio-Temporal Information Fusion for Compressed Video Quality Enhancement

Dengyan Luo,Mao Ye,Shuai Li,Xue Li
DOI: https://doi.org/10.1109/LSP.2022.3147441
2022-01-01
IEEE Signal Processing Letters
Abstract:With the successful application of deformable convolution in aligning different video frames, it has also been used in video compression artifact reduction. The existing methods based on deformable convolution only apply 2D convolutional layers to generate the features for predicting alignment offsets, which is inaccurate due to limited receptive field. In this letter, we propose a new end-to-end network called Coarse-to-Fine Spatio-Temporal Information Fusion (CF-STIF) for compressed video quality enhancement by predicting better offsets with a larger receptive field. Specifically, several 3D convolutional layers are first to roughly fuse the spatio-temporal information in the video sequence, and then a Multi-level Residual Fusion Module (MLRF) is developed to generate global and local fused fine features from different levels for predicting deformable offsets. Thanks to the inherent advantages of 3D convolution and multi-scale strategy, the receptive field is greatly increased in both spatial and temporal dimensions, so that information from neighboring frames can be efficiently aggregated. In the end, the enhanced frame is derived by the proposed reconstruction module (REModule). Both qualitative and quantitative experimental results show that the proposed CF-STIF performs better than the state-of-the-art approaches.
What problem does this paper attempt to address?