Abstract:In the field of video compression, the pursuit for better quality at lower bit rates remains a long-lasting goal. Recent developments have demonstrated the potential of Implicit Neural Representation (INR) as a promising alternative to traditional transform-based methodologies. Video INRs can be roughly divided into frame-wise and pixel-wise methods according to the structure the network outputs. While the pixel-based methods are better for upsampling and parallelization, frame-wise methods demonstrated better performance. We introduce CoordFlow, a novel pixel-wise INR for video compression. It yields state-of-the-art results compared to other pixel-wise INRs and on-par performance compared to leading frame-wise techniques. The method is based on the separation of the visual information into visually consistent layers, each represented by a dedicated network that compensates for the layer's motion. When integrated, a byproduct is an unsupervised segmentation of video sequence. Objects motion trajectories are implicitly utilized to compensate for visual-temporal redundancies. Additionally, the proposed method provides inherent video upsampling, stabilization, inpainting, and denoising capabilities.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to achieve better video quality at a lower bit rate in the field of video compression. Specifically, this paper focuses on improving the performance of existing video compression techniques by introducing a new pixel - level implicit neural representation (INR) method - CoordFlow. ### Main Problems 1. **Quality and Efficiency of Video Compression**: - Traditional transformation - based video compression methods (such as JPEG, MPEG, H.264, etc.) have reached the limits of their adaptability and performance. - Implicit neural representation (INR), as an emerging technology, can surpass the limitations of traditional methods and provide higher compression efficiency and better video quality. 2. **Advantages and Disadvantages of Frame - level and Pixel - level INR**: - Although the frame - level INR method performs better in rate - distortion performance, it has challenges in capturing fine - grained temporal dynamics. - The pixel - level INR method has the advantages of parallelization and up - sampling, but its performance is usually not as good as that of the frame - level method. 3. **Motion Compensation and Redundancy Elimination**: - Existing video compression methods have deficiencies in handling motion compensation and temporal redundancy, resulting in low compression efficiency. ### Solutions The paper proposes CoordFlow, a new pixel - level INR method, aiming to solve the problems in the following ways: - **Separate Visual Information**: Decompose the video sequence into visually consistent layers, each layer is represented by a dedicated network, and compensate for the motion of this layer. - **Multi - layer Architecture**: Through the combination of multiple CoordFlow layers, the model can adaptively segment video content and specifically represent different objects. - **Unsupervised Segmentation**: When processing videos, CoordFlow can automatically segment the foreground and background, thereby improving compression efficiency and video quality. - **Utilize Motion Trajectories**: Implicitly utilize the motion trajectories of objects to reduce visual - temporal redundancy. ### Summary The main contribution of CoordFlow is that it not only achieves state - of - the - art performance among pixel - level INR methods, but also surpasses the frame - level INR method for the first time, and performs excellently in some natural - scene videos. In addition, CoordFlow also provides inherent video up - sampling, stabilization, inpainting, and denoising functions, making it a versatile tool in video processing and editing tasks.

CoordFlow: Coordinate Flow for Pixel-wise Neural Video Representation

FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos

Implicit Neural Video Compression

Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression

Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation

Streaming Neural Images

Implicit-explicit Integrated Representations for Multi-view Video Compression

Implicit Neural Representation for Videos Based on Residual Connection

INR-V: A Continuous Representation Space for Video-based Generative Tasks

Neural Video Representation for Redundancy Reduction and Consistency Preservation

NERV++: An Enhanced Implicit Neural Video Representation

DNeRV: Modeling Inherent Dynamics Via Difference Neural Representation for Videos.

VQNeRV: Vector Quantization Neural Representation for Video Compression

Frame-Recurrent Video Inpainting by Robust Optical Flow Inference

Ps-nerv: patch-wise stylized neural representations for videos

Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation

Breaking the Barriers of One-to-One Usage of Implicit Neural Representation in Image Compression: A Linear Combination Approach with Performance Guarantees

Progressive Fourier Neural Representation for Sequential Video Compilation

NVRC: Neural Video Representation Compression

Boosting Neural Representations for Videos with a Conditional Decoder