Abstract:This work proposes the neural reference synthesis (NRS) to generate high-fidelity reference block for motion estimation and motion compensation (MEMC) in inter frame coding. The NRS is comprised of two submodules: one for reconstruction enhancement and the other for reference generation. Although numerous methods have been developed in the past for these two submodules using either handcrafted rules or deep convolutional neural network (CNN) models, they basically deal with them separately, resulting in limited coding gains. By contrast, the NRS proposes to optimize them collaboratively. It first develops two CNN-based models, namely EnhNet and GenNet. The EnhNet only uses spatial correlations within the current frame for reconstruction enhancement and the GenNet is then augmented by further aggregating temporal correlations across multiple frames for reference synthesis. However, a direct concatenation of EnhNet and GenNet without considering the complex temporal reference dependency across inter frames would implicitly induce iterative CNN processing and cause the data overfitting problem, leading to visually-disturbing artifacts and oversmoothed pixels. To tackle this problem, the NRS applies a new training strategy to coordinate the EnhNet and GenNet for more robust and generalizable models, and also devises a lightweight multi-level R-D (rate-distortion) selection policy for the encoder to adaptively choose reference blocks generated from the proposed NRS model or conventional coding process. Our NRS not only offers state-of-the-art coding gains, e.g., >10% BD-Rate (Bjøntegaard Delta Rate) reduction against the High Efficiency Video Coding (HEVC) anchor for a variety of common test video sequences encoded at a wide bit range in both low-delay and random access settings, but also greatly reduces the complexity relative to existing learning-based methods by utilizing more lightweight DNNs. All models are made publicly accessible at https://github.com/IVC-Projects/NRS for reproducible research.

Decomposition, Compression, and Synthesis Based Video Coding: A Neural Approach Through Reference-Based Super Resolution

Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A Neural Exploration via Resolution-Adaptive Learning

A Video Coding System with Spatial-Temporal Down-/Up-Sampling and Super-Resolution Reconstruction

Decoder-side Cross Resolution Synthesis for Video Compression Enhancement

Super-Resolving Compressed Video in Coding Chain

A Smart Reference Picture Resampling Approach for VVC

Synthesis-Aware Region-Based 3D Video Coding.

CNN-based Super Resolution for Video Coding Using Decoded Information.

An Adaptive Down-Sampling Based Video Coding with Hybrid Super-Resolution Method.

Semantic Neural Rendering-based Video Coding: Towards Ultra-Low Bitrate Video Conferencing

FM-VSR: Feature Multiplexing Video Super-Resolution for Compressed Video

Enhanced Video Super-Resolution Network Towards Compressed Data

Video Super-Resolution Algorithm Based on Spatial-Temporal Feature and Neural Network

A Dual-Network Based Super-Resolution for Compressed High Definition Video.

Neural Reference Synthesis for Inter Frame Coding.

Deep Compressed Video Super-Resolution With Guidance of Coding Priors

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

VCISR: Blind Single Image Super-Resolution with Video Compression Synthetic Data

High-Efficiency Neural Video Compression via Hierarchical Predictive Learning

Spatial-Temporal Transformer based Video Compression Framework

Deep Video Compression with Scaled Hierarchical Bi-directional Motion Model