Abstract:Recent works in volume rendering, \textit{e.g.} NeRF and 3D Gaussian Splatting (3DGS), significantly advance the rendering quality and efficiency with the help of the learned implicit neural radiance field or 3D Gaussians. Rendering on top of an explicit representation, the vanilla 3DGS and its variants deliver real-time efficiency by optimizing the parametric model with single-view supervision per iteration during training which is adopted from NeRF. Consequently, certain views are overfitted, leading to unsatisfying appearance in novel-view synthesis and imprecise 3D geometries. To solve aforementioned problems, we propose a new 3DGS optimization method embodying four key novel contributions: 1) We transform the conventional single-view training paradigm into a multi-view training strategy. With our proposed multi-view regulation, 3D Gaussian attributes are further optimized without overfitting certain training views. As a general solution, we improve the overall accuracy in a variety of scenarios and different Gaussian variants. 2) Inspired by the benefit introduced by additional views, we further propose a cross-intrinsic guidance scheme, leading to a coarse-to-fine training procedure concerning different resolutions. 3) Built on top of our multi-view regulated training, we further propose a cross-ray densification strategy, densifying more Gaussian kernels in the ray-intersect regions from a selection of views. 4) By further investigating the densification strategy, we found that the effect of densification should be enhanced when certain views are distinct dramatically. As a solution, we propose a novel multi-view augmented densification strategy, where 3D Gaussians are encouraged to get densified to a sufficient number accordingly, resulting in improved reconstruction accuracy.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Existing methods based on 3D Gaussian Splatting have the problem of over - fitting certain training views in novel view synthesis (NVS), resulting in unsatisfactory appearance performance and inaccurate 3D geometric structures in new views. Specifically, the traditional single - view supervised training paradigm will cause 3D Gaussian kernels to over - fit certain specific views, thus affecting the overall rendering quality and geometric accuracy.
To solve the above problems, the author proposes a new optimization method - Multi - View - Guided Gaussian Splatting (MVGS), and makes the following four key contributions:
1. **Multi - view training strategy**: Transform the traditional single - view training paradigm into a multi - view training strategy. Through multi - view regulation, further optimize 3D Gaussian attributes, avoid over - fitting specific training views, and thus improve the overall accuracy of various scenes and different Gaussian variants.
2. **Cross - intrinsic guidance scheme**: Inspired by the benefits brought by multi - views, a cross - intrinsic guidance scheme from low - resolution to high - resolution is proposed to realize a coarse - to - fine training process, enabling 3D Gaussians to better adapt to pixel - level local features.
3. **Cross - ray densification strategy**: Based on multi - view regulation training, a cross - ray densification strategy is further proposed. More Gaussian kernels are added in the intersection area under the selected views to improve the reconstruction performance.
4. **Multi - view enhanced densification strategy**: When the differences between views are significant, a multi - view enhanced densification strategy is proposed to encourage 3D Gaussian kernels to be fully densified under these views, thereby improving the reconstruction accuracy.
Through extensive experiments, the author proves that this method can significantly improve the novel view synthesis performance of Gaussian - based explicit representation methods in multiple tasks, especially when dealing with complex scenes with strong reflection, transparency and fine details.
### Formula display
The formulas involved in the paper include loss functions and gradient calculations, etc. The following is the Markdown - format display of some key formulas:
- **Loss function of the original 3DGS**:
\[
L(G, E_i, K_i)=\frac{1}{HW}\sum_{p = 0}^{H\times W}(1-\lambda)\cdot L_1(I_i(p), C(r(p, E_i, K_i)))+\lambda\cdot L_{D - SSIM}(I_i(p), C(r(p, E_i, K_i)))
\]
- **Gradient calculation of multi - view - guided learning**:
\[
\frac{\partial L}{\partial \{G\}}=\frac{\partial L(G_1, E_1, K_1)}{\partial G_1}+\frac{\partial L(G_2, E_2, K_2)}{\partial G_2}+\cdots+\frac{\partial L(G_M, E_M, K_M)}{\partial G_M}
\]
- **Adaptive threshold calculation**:
\[
\hat{\beta}=\beta\left(\frac{r_i}{\tau}-1\right)H\left(\frac{r_i}{\tau}-1\right)+\beta\left(1 - H\left(\frac{r_i}{\tau}-1\right)\right)
\]
where \(H(\cdot)\) is the Heaviside function, which returns 1 when the input is greater than or equal to 0.
These formulas ensure the stability and effectiveness of the model in multi - view training, thereby improving the quality of novel view synthesis.