Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting

Zeyu Yang,Hongye Yang,Zijie Pan,Li Zhang
2024-02-22
Abstract:Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics. Despite advancements in neural implicit models, limitations persist: (i) Inadequate Scene Structure: Existing methods struggle to reveal the spatial and temporal structure of dynamic scenes from directly learning the complex 6D plenoptic function. (ii) Scaling Deformation Modeling: Explicitly modeling scene element deformation becomes impractical for complex dynamics. To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling. Learning to optimize the 4D primitives enables us to synthesize novel views at any desired time with our tailored rendering routine. Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics. This approach offers simplicity, flexibility for variable-length video and end-to-end training, and efficient real-time rendering, making it suitable for capturing complex dynamic scene motions. Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to reconstruct dynamic 3D scenes from 2D images and generate new views in real - time in computer vision and graphics. Specifically, the paper focuses on two main challenges faced by existing methods when dealing with dynamic scenes: 1. **Insufficient scene structure**: Existing methods have difficulty directly revealing the spatial and temporal structures of dynamic scenes from complex 6D light functions. 2. **Scalability of deformation modeling**: For element deformation in complex dynamic scenes, explicit modeling becomes impractical. To address these challenges, the authors propose a new method that approximates the spatio - temporal 4D volume of dynamic scenes by optimizing a set of 4D primitive volumes, thereby achieving efficient reconstruction of dynamic scenes and real - time generation of new views. This method can not only capture the intrinsic motion of the scene but also support end - to - end training and efficient real - time rendering, and is suitable for high - resolution, realistic view synthesis of complex dynamic scenes. ### Main contributions 1. **4D Gaussian primitives**: Unbiased 4D Gaussian primitives are proposed and combined with a specialized splatting - based rendering pipeline to achieve consistent integrated modeling in the spatio - temporal dimension. 2. **4D spherical harmonics**: 4D spherical harmonics (Spherindrical Harmonics) are introduced to model the color that changes with time and viewing angle in dynamic scenes. 3. **Experimental verification**: Extensive experiments were carried out on multiple datasets, including synthetic and real - world scenes, monocular and multi - view scenes, and the results show that this method outperforms existing methods in both visual quality and efficiency. ### Method overview 1. **Review of 3D Gaussian splatting**: First, the basic concepts of 3D Gaussian splatting are reviewed. This is a method that represents static 3D scenes using anisotropic Gaussians and can achieve real - time high - fidelity new - view synthesis on the GPU. 2. **4D Gaussian representation**: Extend 3D Gaussian splatting and propose 4D Gaussian representation for dynamic scenes. Each 4D Gaussian is parameterized by a 4D ellipsoid and can be arbitrarily rotated, while using 4D spherical harmonics to represent the color that changes with time and viewing angle. 3. **Optimization framework**: Supervised optimization is carried out through rendering loss, and density control is performed during the training process to improve geometric and rendering quality. ### Experimental results - **Multi - view real - world scenes**: Quantitative evaluation on the Plenoptic Video dataset shows that this method significantly outperforms existing methods in terms of rendering quality and speed, and is the only method that can provide high - quality dynamic new - view synthesis while performing real - time rendering. - **Monocular synthetic video**: Experiments on the D - NeRF dataset show that this method also performs well when dealing with monocular dynamic scenes and can efficiently exchange information at different time steps without introducing topological assumptions. In conclusion, this paper proposes an innovative method that effectively addresses the key challenges in dynamic scene reconstruction and new - view generation through 4D Gaussian primitives and 4D spherical harmonics.