Abstract:Despite the substantial progress of novel view synthesis, existing methods, either based on the Neural Radiance Fields (NeRF) or more recently 3D Gaussian Splatting (3DGS), suffer significant degradation when the input becomes sparse. Numerous efforts have been introduced to alleviate this problem, but they still struggle to synthesize satisfactory results efficiently, especially in the large scene. In this paper, we propose SCGaussian, a Structure Consistent Gaussian Splatting method using matching priors to learn 3D consistent scene structure. Considering the high interdependence of Gaussian attributes, we optimize the scene structure in two folds: rendering geometry and, more importantly, the position of Gaussian primitives, which is hard to be directly constrained in the vanilla 3DGS due to the non-structure property. To achieve this, we present a hybrid Gaussian representation. Besides the ordinary non-structure Gaussian primitives, our model also consists of ray-based Gaussian primitives that are bound to matching rays and whose optimization of their positions is restricted along the ray. Thus, we can utilize the matching correspondence to directly enforce the position of these Gaussian primitives to converge to the surface points where rays intersect. Extensive experiments on forward-facing, surrounding, and complex large scenes show the effectiveness of our approach with state-of-the-art performance and high efficiency. Code is available at <a class="link-external link-https" href="https://github.com/prstrive/SCGaussian" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the problem of novel view synthesis (NVS) in few-shot scenarios. Specifically, existing methods such as Neural Radiance Fields (NeRF) or the recently proposed 3D Gaussian Splatting (3DGS) perform poorly when input views are sparse, especially in large-scale scenes. Despite numerous attempts to mitigate this issue, these methods still struggle to efficiently generate satisfactory results.
### Background and Challenges
1. **Limitations of Existing Methods**:
- **NeRF**: While NeRF can generate high-quality novel view images under dense views, it requires a significant amount of time and computational resources under sparse views.
- **3DGS**: 3DGS significantly improves rendering speed through efficient differentiable splatting techniques, but still faces novel view degradation issues under sparse views.
2. **Challenges of Sparse Views**:
- **Insufficient Multi-view Constraints**: The model can only learn from limited sparse views, making optimization difficult.
- **Interdependence of Gaussian Attributes**: There is ambiguity in the optimization of position and shape, making it challenging to directly constrain the position of Gaussian primitives.
### Solution
To address these challenges, the authors propose SCGaussian (Structure Consistent Gaussian Splatting), a method that leverages matching prior to learn a 3D consistent scene structure. Specifically:
1. **Hybrid Gaussian Representation**:
- In addition to ordinary unstructured Gaussian primitives, ray-based Gaussian primitives are introduced. These primitives are bound to matching rays, and their positions are optimized along the ray direction.
2. **Optimization Strategy**:
- **Position Optimization**: Using the correspondence of matching rays, the positions of Gaussian primitives are optimized to converge to the surface positions at the ray intersections.
- **Geometry Optimization**: By minimizing projection errors, the rendering geometry structure is optimized to ensure consistency in the positions and shapes of Gaussian primitives.
3. **Loss Function**:
- A combined loss function is formed by integrating ordinary photometric loss, Gaussian position loss, and rendering geometry loss to guide model optimization.
### Experimental Results
1. **Quantitative Comparison**:
- Experiments were conducted on multiple datasets (LLFF, IBRNet, Tanks and Temples, DTU, and Blender), showing that SCGaussian outperforms existing methods in both rendering quality and efficiency.
- SCGaussian demonstrates significant advantages, particularly in complex large-scale scenes and low-texture scenes.
2. **Qualitative Analysis**:
- Through visual comparisons, SCGaussian is able to generate more accurate and detailed novel view images, especially under sparse view conditions.
### Conclusion
By introducing matching prior and hybrid Gaussian representation, this paper effectively addresses the challenges of novel view synthesis under sparse views, achieving high-quality and efficient novel view generation.