DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

Xuening Yuan,Hongyu Yang,Yueming Zhao,Di Huang
2024-04-14
Abstract:Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain. Nevertheless, generating 3D scenes characterized by multiple instances and intricate arrangements remains challenging. In this study, we present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions, leveraging the strong 3D representation capabilities of Gaussian Splatting and the complex arrangement abilities of large language models (LLMs). Our approach involves a 3D Gaussian Guide ($3{DG^2}$) for scene representation, consisting of semantic primitives (objects) and their spatial transformations and relationships derived directly from text prompts using LLMs. This compositional representation allows for local-to-global optimization of the entire scene. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene, which addresses training instability issue arising from simple blending in the subsequent global optimization stage. To mitigate potential biases of LLM priors, we model collision relationships between objects at the global level, enhancing physical correctness and overall realism. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we introduce a sparse initialization and densification strategy. Experiments demonstrate that DreamScape offers high usability and controllability, enabling the generation of high-fidelity 3D scenes from only text prompts and achieving state-of-the-art performance compared to other methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the challenges faced in generating high-quality, high-fidelity 3D scenes based on textual descriptions. Specifically, existing methods struggle with the complex arrangement of objects in scenes containing multiple items, and the generated results may exhibit 3D inconsistencies and geometric distortions. Additionally, these methods have difficulties in generating "diffuse objects" such as rain, snow, or petals that are scattered throughout the scene. To address these issues, the authors propose DreamScape, a method that combines the advantages of Gaussian Splatting and large-scale language models (LLMs). DreamScape improves existing techniques through the following innovations: 1. **3D Gaussian Guide (3𝐷𝐺2)**: This method utilizes large-scale language models to parse the scene from text prompts and generate an initial 3D Gaussian guide. This allows the model to understand the position, size, and relationships of each object. 2. **Local-to-Global Optimization Strategy**: DreamScape employs a local-to-global training strategy, focusing on the detailed generation of individual objects during the local phase to ensure 3D consistency of each object; during the global phase, it optimizes the overall scene consistency and captures interactions between objects, such as water ripples and reflection effects. 3. **Progressive Scale Control**: To prevent shape distortion during scaling, DreamScape introduces a progressive scale control technique that gradually adjusts the size proportions of objects, thereby maintaining good geometric shapes and texture features. 4. **Collision Loss**: To avoid overlapping or misalignment of objects, DreamScape introduces collision loss during the global optimization phase to ensure physical correctness. 5. **Sparse Initialization and Densification Strategy**: For "diffuse objects," DreamScape adopts a sparse initialization and special densification strategy to avoid clumping of small objects, resulting in more realistic scenes. Through these innovations, DreamScape is capable of generating high-quality 3D scenes based solely on textual descriptions and supports various editing functions, demonstrating advanced performance in the field of 3D scene generation.