Abstract:Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain. Nevertheless, generating 3D scenes characterized by multiple instances and intricate arrangements remains challenging. In this study, we present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions, leveraging the strong 3D representation capabilities of Gaussian Splatting and the complex arrangement abilities of large language models (LLMs). Our approach involves a 3D Gaussian Guide ($3{DG^2}$) for scene representation, consisting of semantic primitives (objects) and their spatial transformations and relationships derived directly from text prompts using LLMs. This compositional representation allows for local-to-global optimization of the entire scene. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene, which addresses training instability issue arising from simple blending in the subsequent global optimization stage. To mitigate potential biases of LLM priors, we model collision relationships between objects at the global level, enhancing physical correctness and overall realism. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we introduce a sparse initialization and densification strategy. Experiments demonstrate that DreamScape offers high usability and controllability, enabling the generation of high-fidelity 3D scenes from only text prompts and achieving state-of-the-art performance compared to other methods.

What problem does this paper attempt to address?

The paper attempts to address the challenges faced in generating high-quality, high-fidelity 3D scenes based on textual descriptions. Specifically, existing methods struggle with the complex arrangement of objects in scenes containing multiple items, and the generated results may exhibit 3D inconsistencies and geometric distortions. Additionally, these methods have difficulties in generating "diffuse objects" such as rain, snow, or petals that are scattered throughout the scene. To address these issues, the authors propose DreamScape, a method that combines the advantages of Gaussian Splatting and large-scale language models (LLMs). DreamScape improves existing techniques through the following innovations: 1. **3D Gaussian Guide (3𝐷𝐺2)**: This method utilizes large-scale language models to parse the scene from text prompts and generate an initial 3D Gaussian guide. This allows the model to understand the position, size, and relationships of each object. 2. **Local-to-Global Optimization Strategy**: DreamScape employs a local-to-global training strategy, focusing on the detailed generation of individual objects during the local phase to ensure 3D consistency of each object; during the global phase, it optimizes the overall scene consistency and captures interactions between objects, such as water ripples and reflection effects. 3. **Progressive Scale Control**: To prevent shape distortion during scaling, DreamScape introduces a progressive scale control technique that gradually adjusts the size proportions of objects, thereby maintaining good geometric shapes and texture features. 4. **Collision Loss**: To avoid overlapping or misalignment of objects, DreamScape introduces collision loss during the global optimization phase to ensure physical correctness. 5. **Sparse Initialization and Densification Strategy**: For "diffuse objects," DreamScape adopts a sparse initialization and special densification strategy to avoid clumping of small objects, resulting in more realistic scenes. Through these innovations, DreamScape is capable of generating high-quality 3D scenes based solely on textual descriptions and supports various editing functions, demonstrating advanced performance in the field of 3D scene generation.

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

Text-to-3D Using Gaussian Splatting

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections