DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Haoran Li,Haolin Shi,Wenli Zhang,Wenjun Wu,Yong Liao,Lin Wang,Lik-hang Lee,Pengyuan Zhou
2024-07-19
Abstract:Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at <a class="link-external link-https" href="https://dreamscene-project.github.io" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on several key challenges in current text - to - 3D scene generation methods: 1. **Inefficient generation process**: Existing methods often lead to low - quality generation and long completion times. 2. **Inconsistent 3D visual cues**: The generated results perform well at specific camera positions, but the 3D consistency of the overall scene is poor. 3. **Difficulty in separating objects from the environment**: It is unable to effectively separate objects from the environment, limiting the flexible editing of individual elements. To address these challenges, the paper proposes **DreamScene**, a novel text - to - 3D scene generation framework based on 3D Gaussians. DreamScene mainly solves the above problems through the following two strategies: 1. **Formation Pattern Sampling (FPS)**: - FPS is a multi - time - step sampling strategy, guided by the formation patterns of 3D objects, and can quickly generate semantically rich and high - quality representations. - FPS uses 3D Gaussian filtering to optimize stability and utilizes reconstruction techniques to generate realistic textures. 2. **Progressive three - stage camera sampling strategy**: - This strategy is specifically designed for indoor and outdoor settings and effectively ensures the integration of objects and the environment as well as the 3D consistency of the entire scene. - Finally, by integrating objects and the environment, DreamScene enhances the flexibility of scene editing, allowing for target adjustment. The paper verifies the superiority of DreamScene through extensive experiments, indicating its broad application potential in generating high - quality, consistent, and editable 3D scenes. The code and demonstration have been published at [https://dreamscene - project.github.io](https://dreamscene - project.github.io). ### Main contributions - **Proposing DreamScene**: A novel text - driven 3D scene generation framework that efficiently generates high - quality, scene - level consistent, and editable 3D scenes through formation pattern sampling, strategic camera sampling, and seamless object - environment integration. - **Formation Pattern Sampling (FPS)**: Combining multi - time - step sampling, 3D Gaussian filtering, and reconstruction generation, it can generate high - quality, semantically rich 3D representations within 30 minutes. - **Qualitative and quantitative experiments**: Demonstrate that DreamScene outperforms existing methods in text - driven 3D object and scene generation, showing its great potential in multiple fields such as games and movies.