Spatio-Temporal Information Fusion Network for Compressed Video Quality Enhancement

Weiwei Huang,Kebin Jia,Pengyu Liu,Yuan Yu
DOI: https://doi.org/10.1109/dcc55655.2023.00055
2023-01-01
Abstract:Video is often compressed by standard compression algorithms to facilitate storage and transmission. Compressed video will produce artifacts that affect the video quality. How to improve the quality of compressed video during video post-processing has become an important topic in the multimedia field. This paper proposes a Spatio-Temporal Information Fusion Network for quality enhancement of compressed video, as shown in Fig. 1. The algorithm comprises two parts. In the first part, we use 3D convolution to build a U-shaped network to model the temporal dynamics between input frames. We concatenate features with the same spatial resolution from shallow layers to deep layers by skip-connecting merging channels, which helps the local information of shallow- generated features to reach the output. In the second part, we designed a quality enhancement module to fully mine the spatio-temporal information extracted in the first part, cut the feature map on the time dimension t, and then extract the feature map separately on the spatial dimension and refine the feature. The network is trained in an end-to-end manner, and the data sets are selected from the database Xiph (Xiph.org) and VQEG. We use the H.265/HEVC reference software HM16.5 to compress the video to evaluate the performance of the model under different compression levels. The experimental results show that the average PSNR of 18 HEVC standard sequences is improved by 0.88 dB, 0.86 dB, 0.81 dB and 0.72 dB when the quantization parameters(QP) are equal to 37, 32, 27 and 22, respectively, and the number of parameters are only 0.66 million.
What problem does this paper attempt to address?