VeGaS: Video Gaussian Splatting

Weronika Smolak-Dyżewska,Dawid Malarz,Kornel Howil,Jan Kaczmarczyk,Marcin Mazur,Przemysław Spurek
2024-11-17
Abstract:Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data. The code is available at: <a class="link-external link-https" href="https://github.com/gmum/VeGaS" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in video data processing, although existing Implicit Neural Representations (INRs) can provide good reconstruction quality and compression ratio, they perform poorly in video editing. To solve this problem, the paper proposes the Video Gaussian Splatting (VeGaS) model. This model captures the nonlinear structures in the video stream by introducing Folded - Gaussians and generates 2D Gaussian distributions by conditioning 3D Folded - Gaussians, thereby achieving efficient processing and realistic modification of video data. Specifically, the VeGaS model aims to: 1. **Improve video editing capabilities**: Compared with the traditional 3D Gaussian Splatting (3DGS) method, VeGaS can not only handle linear transformations and displacements, but also perform more complex nonlinear transformations, thereby achieving more realistic video editing effects. 2. **Improve frame reconstruction quality**: By optimizing the time occurrence moments of Gaussian distributions and introducing dynamic frame fitting functions, VeGaS can achieve better performance in frame reconstruction tasks. 3. **Achieve efficient video data representation**: VeGaS utilizes 3D Folded - Gaussians and conditioned 2D Gaussian distributions, and can reduce storage and training time while maintaining high - quality rendering. The paper verifies the superior performance of VeGaS in frame reconstruction and frame interpolation tasks through experiments, and shows its practical application effects in video editing.