Abstract:Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum. Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the issues of redundancy and lack of robustness in existing 3D Gaussian point rendering methods (such as 3D-GS) when dealing with complex scenes. Specifically, while 3D-GS methods perform well in terms of rendering quality and speed, they have the following problems:
1. **Redundancy**: To adapt to each training viewpoint, 3D-GS methods often generate a large number of redundant Gaussian spheres, making the model bulky and difficult to scale to large and complex scenes.
2. **Lack of Robustness**: Since 3D-GS methods mainly rely on training viewpoints, they perform poorly when handling new viewpoints, textureless regions, and lighting effects, lacking sufficient utilization of scene structure.
To solve these problems, the authors propose the Scaffold-GS method, which introduces anchors to distribute local 3D Gaussian spheres and dynamically predicts the properties of these Gaussian spheres based on view direction and distance. This method not only reduces redundant Gaussian spheres but also improves the model's robustness and efficiency in handling different levels of detail and view-dependent observations.
### Main Contributions
1. **Hierarchical and Region-Aware Scene Representation**: By initializing anchors from a sparse voxel grid, the distribution of local 3D Gaussian spheres is guided, forming a hierarchical and region-aware scene representation.
2. **View-Dependent Neural Gaussian Sphere Prediction**: Within the view cone, the properties of neural Gaussian spheres are dynamically predicted from each anchor to adapt to different view directions and distances, achieving more robust novel view synthesis.
3. **Reliable Anchor Growth and Pruning Strategy**: The predicted neural Gaussian spheres are used to improve the growth and pruning operations of anchors, enhancing scene coverage.
### Experimental Results
Through extensive experiments, the authors demonstrate the superior performance of the Scaffold-GS method on multiple datasets, especially when handling complex outdoor scenes and detailed indoor environments. Compared to 3D-GS, Scaffold-GS significantly reduces storage requirements while maintaining similar rendering speed and achieving better visual quality. Specifically, the improvements are shown in the following aspects:
- **Visual Quality**: In various challenging scenes (such as textureless regions, lighting effects, fine geometric structures, etc.), Scaffold-GS exhibits higher visual quality with less blurring and needle-like artifacts.
- **Storage Efficiency**: The storage requirements of Scaffold-GS are significantly reduced, with storage size decreased by several times.
- **View Adaptability**: Scaffold-GS shows stronger adaptability and generalization when handling multi-scale scenes and new viewpoints.
### Conclusion
By introducing anchors and view-dependent neural Gaussian sphere prediction, the Scaffold-GS method effectively addresses the issues of redundancy and lack of robustness in 3D-GS methods, providing a new solution for efficient and high-quality rendering of complex scenes.