Abstract:Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while also tracking scene geometry far better than alternative representations. Yet, these methods assume dense multi-view videos as supervision. In this work, we are interested in extending the capability of Gaussian scene representations to casually captured monocular videos. We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained. Building off this finding, we propose a method we call Dynamic Gaussian Marbles, which consist of three core modifications that target the difficulties of the monocular setting. First, we use isotropic Gaussian "marbles'', reducing the degrees of freedom of each Gaussian. Second, we employ a hierarchical divide and-conquer learning strategy to efficiently guide the optimization towards solutions with globally coherent motion. Finally, we add image-level and geometry-level priors into the optimization, including a tracking loss that takes advantage of recent progress in point tracking. By constraining the optimization, Dynamic Gaussian Marbles learns Gaussian trajectories that enable novel-view rendering and accurately capture the 3D motion of the scene elements. We evaluate on the Nvidia Dynamic Scenes dataset and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly outperforms other Gaussian baselines in quality, and is on-par with non-Gaussian representations, all while maintaining the efficiency, compositionality, editability, and tracking benefits of Gaussians. Our project page can be found here <a class="link-external link-https" href="https://geometry.stanford.edu/projects/dynamic-gaussian-marbles.github.io/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to achieve novel view synthesis of dynamic scenes in daily monocular videos. Specifically, the paper focuses on how to extract 3D geometric structures, motion, and radiance from dynamic - scene videos captured by a single camera and be able to render these scenes from new viewpoints. The key to this challenge lies in recovering 3D information from single - view videos, which is much more difficult than the task in multi - view settings because the latter can provide more constraints to assist the reconstruction process. The paper points out that although the existing 4D Gaussian methods perform well in multi - view videos, they encounter serious problems in monocular videos, mainly due to the under - constrained problem in the monocular setting. To overcome these problems, the authors propose a method named "Dynamic Gaussian Marbles", which adapts to the challenges of monocular videos by introducing three core improvements: 1. **Using isotropic Gaussian "marbles"**: Reduce the degrees of freedom of each Gaussian function, making the optimization process focus more on motion and appearance rather than local shape. 2. **Divide - and - conquer learning strategy**: Adopt a hierarchical learning method to gradually guide the optimization process in order to achieve globally consistent motion. 3. **Adding image - level and geometry - level priors**: Introduce tracking losses and other prior knowledge during the optimization process to improve the robustness and accuracy of the model. Through these improvements, Dynamic Gaussian Marbles can achieve high - quality novel view synthesis in monocular videos while maintaining efficient rendering, good tracking performance, and editability.

Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos

Monocular Dynamic Gaussian Splatting is Fast and Brittle but Smooth Motion Helps

Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis

GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos

A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis

Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting

D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis

Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting

Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction

Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes

MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting

Splatter a Video: Video Gaussian Representation for Versatile Processing

4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization

A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes

Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion

Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos