CoordFlow: Coordinate Flow for Pixel-wise Neural Video Representation

Daniel Silver,Ron Kimmel
2025-01-02
Abstract:In the field of video compression, the pursuit for better quality at lower bit rates remains a long-lasting goal. Recent developments have demonstrated the potential of Implicit Neural Representation (INR) as a promising alternative to traditional transform-based methodologies. Video INRs can be roughly divided into frame-wise and pixel-wise methods according to the structure the network outputs. While the pixel-based methods are better for upsampling and parallelization, frame-wise methods demonstrated better performance. We introduce CoordFlow, a novel pixel-wise INR for video compression. It yields state-of-the-art results compared to other pixel-wise INRs and on-par performance compared to leading frame-wise techniques. The method is based on the separation of the visual information into visually consistent layers, each represented by a dedicated network that compensates for the layer's motion. When integrated, a byproduct is an unsupervised segmentation of video sequence. Objects motion trajectories are implicitly utilized to compensate for visual-temporal redundancies. Additionally, the proposed method provides inherent video upsampling, stabilization, inpainting, and denoising capabilities.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to achieve better video quality at a lower bit rate in the field of video compression. Specifically, this paper focuses on improving the performance of existing video compression techniques by introducing a new pixel - level implicit neural representation (INR) method - CoordFlow. ### Main Problems 1. **Quality and Efficiency of Video Compression**: - Traditional transformation - based video compression methods (such as JPEG, MPEG, H.264, etc.) have reached the limits of their adaptability and performance. - Implicit neural representation (INR), as an emerging technology, can surpass the limitations of traditional methods and provide higher compression efficiency and better video quality. 2. **Advantages and Disadvantages of Frame - level and Pixel - level INR**: - Although the frame - level INR method performs better in rate - distortion performance, it has challenges in capturing fine - grained temporal dynamics. - The pixel - level INR method has the advantages of parallelization and up - sampling, but its performance is usually not as good as that of the frame - level method. 3. **Motion Compensation and Redundancy Elimination**: - Existing video compression methods have deficiencies in handling motion compensation and temporal redundancy, resulting in low compression efficiency. ### Solutions The paper proposes CoordFlow, a new pixel - level INR method, aiming to solve the problems in the following ways: - **Separate Visual Information**: Decompose the video sequence into visually consistent layers, each layer is represented by a dedicated network, and compensate for the motion of this layer. - **Multi - layer Architecture**: Through the combination of multiple CoordFlow layers, the model can adaptively segment video content and specifically represent different objects. - **Unsupervised Segmentation**: When processing videos, CoordFlow can automatically segment the foreground and background, thereby improving compression efficiency and video quality. - **Utilize Motion Trajectories**: Implicitly utilize the motion trajectories of objects to reduce visual - temporal redundancy. ### Summary The main contribution of CoordFlow is that it not only achieves state - of - the - art performance among pixel - level INR methods, but also surpasses the frame - level INR method for the first time, and performs excellently in some natural - scene videos. In addition, CoordFlow also provides inherent video up - sampling, stabilization, inpainting, and denoising functions, making it a versatile tool in video processing and editing tasks.