LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Jaeyoung Chung,Suyoung Lee,Hyeongjin Nam,Jaerin Lee,Kyoung Mu Lee
2023-11-23
Abstract:With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: <a class="link-external link-https" href="https://luciddreamer-cvlab.github.io/" rel="external noopener nofollow">this https URL</a>
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve The paper proposes a new method called **LucidDreamer**, which aims to address several key issues in 3D scene generation: 1. **Cross-Domain Generality**: Existing 3D scene generation models are often limited to specific domains (such as indoor or outdoor scenes), primarily due to training datasets based on 3D scan data, which have significant gaps compared to the real world. LucidDreamer leverages the capabilities of large-scale diffusion models (such as Stable Diffusion) to achieve high-quality 3D scene generation without domain restrictions. 2. **Multi-View Consistency**: The generated images need to maintain consistency and high quality from different viewpoints. To this end, LucidDreamer designs two alternating steps, "Dreaming" and "Alignment," to ensure that multi-view images generated from inputs (text, RGB images, or RGBD images) are consistent and high-resolution. 3. **Detail Richness**: By optimizing 3D Gaussian splatting, LucidDreamer can generate more detailed and realistic 3D scenes than existing methods, addressing the void issues present in traditional representations. In summary, LucidDreamer aims to provide a cross-domain, high-quality 3D scene generation solution that is suitable for various input types and can generate high-quality 3D scenes with multi-view consistency.