VQNeRV: Vector Quantization Neural Representation for Video Compression

Gai Zhang,Lv Tang,Xinfeng Zhang
DOI: https://doi.org/10.1109/iscas58744.2024.10558613
2024-01-01
Abstract:The application of Implicit Neural Representations (INR) for video compression represents an evolving area of research. Despite its potential, conventional CNN-based INR methods often encounter difficulties in modeling complex spatiotemporal scenes. This is primarily due to the inherent limitations of CNNs in extracting intricate information, particularly when it comes to capturing temporal dynamics. Consequently, the integration of spatiotemporal contextual information becomes imperative to enhance INR's capacity for scene modeling. In response to these challenges, this paper introduces a novel approach, termed Vector Quantization Neural Representation (VQNeRV), specifically designed for video compression. Our methodology unfolds in three distinct stages: Firstly, we establish a CNN-based INR equipped with spatial embeddings to model the video content. Secondly, a Vector Quantization (VQ) encoder is then utilized to distill spatiotemporal features from the video. These features are subsequently amalgamated with the spatial embeddings through cross-attention mechanisms, facilitating a comprehensive feature fusion. Finally, an INR decoder, leveraging the combined features embodying spatiotemporal contextual information, reconstructs the video. Experimental results demonstrate that our method shows a much better rate-distortion performance compared to the VVC.
What problem does this paper attempt to address?