QS-NeRV: Real-Time Quality-Scalable Decoding with Neural Representation for Videos

Chang Wu,Guancheng Quan,Gang He,Xin-Quan Lai,Yunsong Li,Wenxin Yu,Xianmeng Lin,Cheng Yang
DOI: https://doi.org/10.1145/3664647.3680586
2024-01-01
Abstract:In this paper, we propose a neural representation for videos that enables real-time quality-scalable decoding, called QS-NeRV. QS-NeRV comprises a Self-Learning Distribution Mapping Network (SDMN) and Extensible Enhancement Networks (EENs). Firstly, SDMN functions as the base layer (BL) for scalable video coding, focusing on encoding videos of lower quality. Within SDMN, we employ a methodology that minimizes the bitstream overhead to achieve efficient information exchange between the encoder and decoder instead of direct transmission. Specifically, we utilize an invertible network to map the multi-scale information obtained from the encoder to a specific distribution. Subsequently, during the decoding process, this information is recovered from a randomly sampled latent variable to assist the decoder in achieving improved reconstruction performance. Secondly, EENs serve as the enhancement layers (ELs) and are trained in an overfitting manner to obtain robust restoration capability. By integrating the fixed BL bitstream with the parameters of EEN as an extension pack, the decoder can produce higher-quality enhanced videos. Furthermore, the scalability of the method allows for adjusting the number of combined packs to accommodate diverse quality requirements. Experimental results demonstrate our proposed QS-NeRV outperforms the state-of-the-art real-time decoding INR-based methods on various datasets for video compression and interpolation tasks.
What problem does this paper attempt to address?