GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting

Qijun Feng,Zhen Xing,Zuxuan Wu,Yu-Gang Jiang
2024-10-31
Abstract:We introduce GeoGS3D, a novel two-stage framework for reconstructing detailed 3D objects from single-view images. Inspired by the success of pre-trained 2D diffusion models, our method incorporates an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, facilitating the generation of multi-view consistent images. During the following Gaussian Splatting, these images are fused with epipolar attention, fully utilizing the geometric correlations across views. Moreover, we propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization, significantly accelerating the reconstruction process. Extensive experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects, both qualitatively and quantitatively.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the problem of reconstructing detailed 3D objects from a single image. Specifically, the paper proposes a new framework called GeoGS3D, which generates multi-view consistent and detailed geometric 3D objects from a single image through a geometry-aware diffusion model and Gaussian splatting technique. ### Background and Challenges 1. **Importance of the Task**: - The task of single-view 3D reconstruction is crucial for machine understanding of the real 3D world and has wide applications in virtual reality (VR), augmented reality (AR), and robotics. 2. **Limitations of Existing Methods**: - Although multi-view diffusion models have made some progress in 3D reconstruction, they still face challenges in maintaining multi-view consistency and handling complex geometric structures. - General reconstruction models proposed by other studies can generate high-quality 3D representations but usually require a large amount of computational resources. - Current methods using 3D Gaussian splatting often overlook the spatial correspondence between multiple views, leading to unnecessary operations and extended optimization time. ### Solution 1. **Two-Stage Framework**: - **Generation Stage**: Extracts 3D geometric features from the input image through a geometry-aware multi-view generation mechanism and generates multi-view consistent images. - **Reconstruction Stage**: Utilizes Gaussian splatting and epipolar attention mechanisms to fuse the generated multi-view images, accelerating the 3D reconstruction process. 2. **Key Technologies**: - **Geometry-Aware Multi-View Generation**: Extracts 3D geometric conditions by decoupling orthogonal planes and combines semantic conditions to generate high-quality multi-view images. - **Epipolar Attention**: Introduces an epipolar attention mechanism during optimization to effectively utilize the geometric correlation between multi-view images, improving reconstruction quality. - **Gaussian Divergence Saliency (GDS)**: Proposes a new metric for pruning unnecessary splitting and cloning operations, significantly accelerating the optimization process. ### Experimental Results 1. **Quantitative Evaluation**: - Experimental results on the Objaverse and Google Scanned Object datasets show that GeoGS3D outperforms existing baseline methods in metrics such as PSNR, SSIM, and LPIPS. 2. **Qualitative Analysis**: - The generated multi-view images are highly consistent with the reference views in terms of geometry and semantics, and exhibit reasonable imagination under large view changes. - The 3D reconstruction results demonstrate the ability to generalize to unseen data, generating high-quality 3D objects and maintaining good performance even with complex structures. ### Limitations and Future Work 1. **Fixed Number of Generated Views**: The current method generates a fixed number of views. Future work could explore adaptive generation of different numbers of views to further reduce reconstruction time. 2. **Single Object Limitation**: The current method is only applicable to single-object 3D reconstruction. Future work could extend it to complex scenes or multi-object reconstruction. ### Conclusion GeoGS3D successfully reconstructs high-quality 3D objects from a single image through geometry-aware multi-view generation and Gaussian splatting techniques, demonstrating consistency across different views, high fidelity to reference images, and reasonable creativity in unseen areas.