Splatter a Video: Video Gaussian Representation for Versatile Processing

Yang-Tian Sun,Yi-Hua Huang,Lin Ma,Xiaoyang Lyu,Yan-Pei Cao,Xiaojuan Qi

2024-06-26

Abstract:Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a novel explicit 3D representation-video Gaussian representation -- that embeds a video into 3D Gaussians. Our proposed representation models video appearance in a 3D canonical space using explicit Gaussians as proxies and associates each Gaussian with 3D motions for video motion. This approach offers a more intrinsic and explicit representation than layered atlas or volumetric pixel matrices. To obtain such a representation, we distill 2D priors, such as optical flow and depth, from foundation models to regularize learning in this ill-posed setting. Extensive applications demonstrate the versatility of our new video representation. It has been proven effective in numerous video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereoscopic video generation. Project page: <a class="link-external link-https" href="https://sunyangtian.github.io/spatter_a_video_web/" rel="external noopener nofollow">this https URL</a>

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper attempts to solve several key problems in video processing, especially the challenges related to video representation. Specifically: 1. **Modeling of complex motions**: Existing methods face difficulties in dealing with complex motions, either due to the lack of 3D structure or relying on implicit 3D representations that are not suitable for manipulation tasks. 2. **Object occlusion problem**: Current methods perform poorly in handling object occlusions (especially complex self - occlusions), leading to error propagation and problems in editing tasks. 3. **Tasks requiring 3D information**: Many video processing tasks (such as consistent depth prediction, stereo video generation, etc.) require 3D information, while existing methods have limited or no ability in this regard. To solve these problems, the authors propose a new explicit 3D representation method - **Video Gaussian Representation (VGR)**. This method embeds the video into a 3D Gaussian distribution, models the video appearance through an explicit Gaussian distribution proxy, and associates each Gaussian distribution with 3D motion attributes to control its position at different time steps, thereby achieving the modeling of video motion. Specifically, the main contributions of VGR include: - Providing a more intrinsic and explicit representation than layered atlases or volumetric pixel matrices. - Solving this ill - posed problem by regularizing learning from 2D priors (such as optical flow and depth) extracted from the base model. - Demonstrating a wide range of applications in various video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereo video generation. Through these improvements, VGR can better handle complex motions, occlusions, and noise in videos while maintaining temporal consistency, thus providing more powerful support for various video processing tasks.

Splatter a Video: Video Gaussian Representation for Versatile Processing

VeGaS: Video Gaussian Splatting

Representing Long Volumetric Video with Temporal Gaussian Hierarchy

Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Spatio-Temporal Video Segmentation of Static Scenes and Its Applications

BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors

HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction

Deblur4DGS: 4D Gaussian Splatting from Blurry Monocular Video

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives

SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

SuperGaussian: Repurposing Video Models for 3D Super Resolution

MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting

GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views

GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation

Dynamic 3D Point Cloud Sequences as 2D Videos