Abstract:Although deep learning technique has achieved significant improvement on image compression, but its advantages are not fully explored in video compression, which leads to the performance of deep-learning based video compression (DLVC) is obvious inferior to that of hybrid video coding framework. In this paper, we proposed a novel network to improve the performance of DLVC from its most important modules, including Motion Process (MP), Residual Compression (RC) and Frame Reconstruction (FR). In MP, we design a split second-order attention and multi-scale feature extraction module to fully remove the warping artifacts from multi-scale feature space and pixel space, which can help reduce the distortion in the following process. In RC, we propose a channel selection mechanism to gradually drop redundant information while preserving informative channels for a better rate-distortion performance. Finally, in FR, we introduce a residual multi-scale recurrent network to improve the quality of the current reconstructed frame by progressively exploiting temporal context information between it and its several previous reconstructed frames. Extensive experiments are conducted on the three widely used video compression datasets (HEVC, UVG and MCL-JVC), and the performance demonstrates the superiority of our proposed approach over the state-of-the-art methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the field of video compression, although deep - learning techniques have made significant improvements in image compression, their advantages in video compression have not been fully explored, resulting in the performance of deep - learning - based video compression (DLVC) being significantly lower than that of the hybrid video coding framework. To overcome this problem, the author proposes a new network, aiming to improve the performance of DLVC from the three most important modules in video compression - motion processing (MP), residual compression (RC) and frame reconstruction (FR). Specifically, the paper proposes the following innovations: 1. **Motion Processing (MP)**: A split - second - order attention and multi - scale feature extraction module (SOAME) is designed to completely remove distortion artifacts in multi - scale feature space and pixel space and reduce distortion in subsequent processes. 2. **Residual Compression (RC)**: A channel selection mechanism (CS) is proposed to gradually discard redundant information while retaining information - rich channels to obtain better rate - distortion (RD) performance. 3. **Frame Reconstruction (FR)**: A residual multi - scale recursive network (RMRN) is introduced to improve the quality of the current coarsely reconstructed frame by progressively using the temporal context information between the current reconstructed frame and its several previous frames. These innovations aim to more effectively remove video redundancy from both spatial and temporal perspectives, thereby improving the overall performance of deep - learning - based video compression. Experimental results show that this method outperforms the existing state - of - the - art methods on multiple widely - used video compression datasets (such as HEVC, UVG and MCL - JVC).

High Efficiency Deep-learning Based Video Compression

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

FVC: An End-to-End Framework Towards Deep Video Compression in Feature Space

High-Quality Single-Model Deep Video Compression with Frame-Conv3D and Multi-frame Differential Modulation

Deep Learning-Based Video Coding: A Review and A Case Study

Deep Learning-Based Video Coding

High Visual-Fidelity Learned Video Compression

Improved HEVC Video Compression Algorithm Using Low-Complexity Frame Rate Up Conversion

DMVC: Multi-Camera Video Compression Network aimed at Improving Deep Learning Accuracy

DeepCoder: A Deep Neural Network Based Video Compression

Improved Low-Bitrate HEVC Video Coding Using Deep Learning Based Super-Resolution and Adaptive Block Patching.

A Unified End-to-End Framework for Efficient Deep Image Compression

Deep Convolutional Neural Network For Decompressed Video Enhancement

Spatial-Temporal Transformer based Video Compression Framework

HDVC: Deep Video Compression With Hyperprior-Based Entropy Coding

A Neural-network Enhanced Video Coding Framework beyond ECM

Deep Predictive Video Compression Using Mode-Selective Uni- and Bi-Directional Predictions Based on Multi-Frame Hypothesis

Learned Video Compression with Adaptive Temporal Prior and Decoded Motion-aided Quality Enhancement

Learning-Based Video Coding with Joint Deep Compression and Enhancement

Offline and Online Optical Flow Enhancement for Deep Video Compression

Accelerating Learned Video Compression via Low-Resolution Representation Learning