GBR: Generative Bundle Refinement for High-fidelity Gaussian Splatting and Meshing

Jianing Zhang,Yuchao Zheng,Ziwei Li,Qionghai Dai,Xiaoyun Yuan
2024-12-08
Abstract:Gaussian splatting has gained attention for its efficient representation and rendering of 3D scenes using continuous Gaussian primitives. However, it struggles with sparse-view inputs due to limited geometric and photometric information, causing ambiguities in depth, shape, and texture. we propose GBR: Generative Bundle Refinement, a method for high-fidelity Gaussian splatting and meshing using only 4-6 input views. GBR integrates a neural bundle adjustment module to enhance geometry accuracy and a generative depth refinement module to improve geometry fidelity. More specifically, the neural bundle adjustment module integrates a foundation network to produce initial 3D point maps and point matches from unposed images, followed by bundle adjustment optimization to improve multiview consistency and point cloud accuracy. The generative depth refinement module employs a diffusion-based strategy to enhance geometric details and fidelity while preserving the scale. Finally, for Gaussian splatting optimization, we propose a multimodal loss function incorporating depth and normal consistency, geometric regularization, and pseudo-view supervision, providing robust guidance under sparse-view conditions. Experiments on widely used datasets show that GBR significantly outperforms existing methods under sparse-view inputs. Additionally, GBR demonstrates the ability to reconstruct and render large-scale real-world scenes, such as the Pavilion of Prince Teng and the Great Wall, with remarkable details using only 6 views.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: achieving high - fidelity Gaussian Splatting and Meshing with only a small number of viewpoints (4 - 6 input views). Specifically, existing methods face the following challenges when dealing with sparse - view inputs: 1. **Geometric Accuracy**: Traditional methods usually rely on Structure - from - Motion (SfM) to generate initial point clouds and view poses for high - precision 3D reconstruction. However, with sparse - view inputs, SfM has difficulty generating a sufficiently dense and complete point cloud, thus limiting the geometric accuracy. 2. **Mesh Fidelity**: A high - precision point cloud alone is not sufficient to reconstruct a high - fidelity mesh. Due to the limited multi - view information provided by sparse views, geometric details are easily lost during the Gaussian Primitive optimization process, resulting in a decline in the quality of mesh reconstruction. 3. **Insufficient Supervision**: Sparse - view inputs provide limited supervision signals for Gaussian Primitive optimization and are prone to getting trapped in local minima. Therefore, it is necessary to design effective loss functions and regularization terms to better guide the optimization process. To address these challenges, the authors propose GBR (Generative Bundle Refinement), an effective framework for high - fidelity Gaussian Splatting and Meshing. GBR addresses the above problems through the following key components: 1. **Neural Bundle Adjustment Module**: Combine the traditional bundle adjustment optimizer with deep - learning - based geometric estimators (such as the DUSt3R network) to improve geometric accuracy and point cloud density. 2. **Generative Depth Refinement Module**: Utilize diffusion models to integrate high - resolution RGB information into the point cloud, enhancing geometric details and smoothness while maintaining the consistency of the depth scale. 3. **Multimodal Loss Function**: Combine depth, normal, geometric consistency, pseudo - view synthesis, and photometric loss to provide stronger supervision signals, making Gaussian Primitive optimization more accurate and robust. Through these innovations, GBR can achieve high - quality camera parameter estimation, depth/normal map estimation, new - view synthesis, and mesh reconstruction with only 4 - 6 input views. Experimental results show that GBR performs significantly better than existing methods under sparse - view inputs and can reconstruct and render large - scale real - world scenes, such as Tengwang Pavilion and the Great Wall, with an extremely high level of detail.