LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors

Hanyang Yu,Xiaoxiao Long,Ping Tan
2024-09-18
Abstract:We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable successes in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting on input images, and a lack of details. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high-quality 360-degree scene reconstruction. Visual results are on our website.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### The Problem Addressed by the Paper The paper aims to address the problem of reconstructing 3D scenes from sparse viewpoint images. Although recent methods such as 3D Gaussian Splatting (3DGS) have achieved significant success in 3D reconstruction, these methods typically require hundreds of densely captured input images of the scene, making them both time-consuming and impractical for real-world applications. However, sparse viewpoint reconstruction is inherently ill-posed and under-constrained, often leading to poor quality and incomplete reconstruction results. These issues include initialization failures, overfitting to input images, and a lack of detail. To mitigate these challenges, the paper introduces the LM-Gaussian method, which can generate high-quality reconstruction results from a limited number of images. Specifically, the paper proposes the following innovations: 1. **Robust Initialization Module**: Utilizes stereo priors to assist in recovering camera poses and reliable point clouds. 2. **Diffusion-Based Refinement**: Iteratively incorporates image diffusion priors into the Gaussian optimization process to preserve intricate scene details. 3. **Video Diffusion Priors**: Further enhances rendered images to achieve realistic visual effects. Overall, this method significantly reduces the data acquisition requirements compared to previous 3DGS methods and demonstrates its effectiveness through experiments on various public datasets, showcasing its potential for high-quality 360-degree scene reconstruction.