Decomposition, Compression, and Synthesis Based Video Coding: A Neural Approach Through Reference-Based Super Resolution

Ming Lu,Tong Chen,zhenyu Dai,Dong Wang,Dandan Ding,Zhan Ma
2020-01-01
Abstract:In pursuit of higher compression efficiency, a potential solution is the Down-Sampling based Video Coding (DSVC) where a input video is first downscaled for encoding at a relatively lower resolution, and then decoded frames are super-resolved through deep neural networks (DNNs). However, the coding gains are often bounded due to either uniform resolution sampling induced severe loss of high-frequency component, or insufficient information aggregation across non-uniformly sampled frames in existing DSVC methods. To address this, we propose to first decompose the input video into respective spatial texture frames (STFs) at its native spatial resolution that preserve the rich spatial details, and the other temporal motion frames (TMFs) at a lower spatial resolution that retain the motion smoothness; then compress them together using any popular video coder; and finally synthesize decoded STFs and TMFs for high-fidelity video reconstruction at the same resolution as its native input. This work simply applies the bicubic sampling in decomposition and Versatile Video Coding (VVC) compliant codec in compression, and puts the focus on the synthesis part. Such cross-resolution synthesis can be facilitated by Reference-based Super-Resolution (RefSR). Specifically, a motion compensation network (MCN) is devised on TMFs to efficiently align and aggregate temporal motion features that will be jointly processed with corresponding STFs using a texture transfer network (TTN) to better augment spatial details, by which the compression and resolution re-sampling noises can be effectively alleviated with better rate-distortion (R-D) efficiency, etc.
What problem does this paper attempt to address?