High Efficiency Deep-learning Based Video Compression

Lv Tang,Xinfeng Zhang
DOI: https://doi.org/10.1145/3661311
2024-04-23
Abstract:Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning based video compression (DLVC) is obvious inferior to that of hybrid video coding framework. In this paper, we proposed a novel network to improve the performance of DLVC from its most important modules, including Motion Process (MP), Residual Compression (RC) and Frame Reconstruction (FR). In MP, we design a split second-order attention and multi-scale feature extraction module to fully remove the warping artifacts from multi-scale feature space and pixel space, which can help reduce the distortion in the following process. In RC, we propose a channel selection mechanism to gradually drop redundant information while preserving informative channels for a better rate-distortion performance. Finally, in FR, we introduce a residual multi-scale recurrent network to improve the quality of the current reconstructed frame by progressively exploiting temporal context information between it and its several previous reconstructed frames. Extensive experiments are conducted on the three widely used video compression datasets (HEVC, UVG and MCL-JVC), and the performance demonstrates the superiority of our proposed approach over the state-of-the-art methods.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the field of video compression, although deep - learning techniques have made significant improvements in image compression, their advantages in video compression have not been fully explored, resulting in the performance of deep - learning - based video compression (DLVC) being significantly lower than that of the hybrid video coding framework. To overcome this problem, the author proposes a new network, aiming to improve the performance of DLVC from the three most important modules in video compression - motion processing (MP), residual compression (RC) and frame reconstruction (FR). Specifically, the paper proposes the following innovations: 1. **Motion Processing (MP)**: A split - second - order attention and multi - scale feature extraction module (SOAME) is designed to completely remove distortion artifacts in multi - scale feature space and pixel space and reduce distortion in subsequent processes. 2. **Residual Compression (RC)**: A channel selection mechanism (CS) is proposed to gradually discard redundant information while retaining information - rich channels to obtain better rate - distortion (RD) performance. 3. **Frame Reconstruction (FR)**: A residual multi - scale recursive network (RMRN) is introduced to improve the quality of the current coarsely reconstructed frame by progressively using the temporal context information between the current reconstructed frame and its several previous frames. These innovations aim to more effectively remove video redundancy from both spatial and temporal perspectives, thereby improving the overall performance of deep - learning - based video compression. Experimental results show that this method outperforms the existing state - of - the - art methods on multiple widely - used video compression datasets (such as HEVC, UVG and MCL - JVC).