Abstract:Text-driven 3D indoor scene generation holds broad applications, ranging from gaming and smart homes to AR/VR applications. Fast and high-fidelity scene generation is paramount for ensuring user-friendly experiences. However, existing methods are characterized by lengthy generation processes or necessitate the intricate manual specification of motion parameters, which introduces inconvenience for users. Furthermore, these methods often rely on narrow-field viewpoint iterative generations, compromising global consistency and overall scene quality. To address these issues, we propose FastScene, a framework for fast and higher-quality 3D scene generation, while maintaining the scene consistency. Specifically, given a text prompt, we generate a panorama and estimate its depth, since the panorama encompasses information about the entire scene and exhibits explicit geometric constraints. To obtain high-quality novel views, we introduce the Coarse View Synthesis (CVS) and Progressive Novel View Inpainting (PNVI) strategies, ensuring both scene consistency and view quality. Subsequently, we utilize Multi-View Projection (MVP) to form perspective views, and apply 3D Gaussian Splatting (3DGS) for scene reconstruction. Comprehensive experiments demonstrate FastScene surpasses other methods in both generation speed and quality with better scene consistency. Notably, guided only by a text prompt, FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods, making it a paradigm for user-friendly scene generation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to quickly generate high - quality and consistent 3D indoor scenes. Specifically, the existing 3D scene generation methods have the following problems: 1. **Long - generation - process time**: Existing methods usually take a long time to generate 3D scenes, which affects the user experience. 2. **Complexity of manually specifying motion parameters**: Some methods require users to manually specify complex motion parameters, increasing the difficulty of user operations. 3. **Poor global consistency**: Existing methods often rely on iterative generation with a narrow - angle view, resulting in poor global consistency and overall scene quality. To overcome these problems, the authors proposed the FastScene framework, aiming to quickly generate high - quality and consistent 3D indoor scenes in a text - driven manner. Specifically, the main contributions of FastScene are as follows: 1. **Quick generation of high - quality scenes**: FastScene can generate high - quality 3D scenes in a short time without the need to pre - design complex camera parameters or motion trajectories, improving user - friendliness. 2. **Panorama generation and depth estimation**: Use the pre - trained Diffusion360 to generate panoramas and EGformer for depth estimation to capture the spatial information of the scene. 3. **Multi - view projection and 3D Gaussian point cloud generation**: Convert panoramas into perspective views through multi - view projection (MVP), and then use 3D Gaussian point cloud generation (3DGS) for scene reconstruction. 4. **Progressive new - view inpainting**: Proposed a progressive new - view inpainting (PNVI) strategy to gradually fill the holes in the panorama, ensuring the consistency of the scene and the view quality. In summary, FastScene solves the deficiencies of existing 3D scene generation methods in terms of speed, quality, and consistency through a series of innovative methods, providing a user - friendly solution.

FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting

SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

SceneWiz3D: Towards Text-guided 3D Scene Composition

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

PaintScene4D: Consistent 4D Scene Generation from Text Prompts

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting

SceneCraft: Layout-Guided 3D Scene Generation

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

SceneTeller: Language-to-3D Scene Generation

SynthText3D:synthesizing Scene Text Images from 3D Virtual Worlds

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

A new framework for automatic 3D scene construction from text description

DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

Fast 3D Indoor Scene Synthesis by Learning Spatial Relation Priors of Objects

Toward Scene Graph and Layout Guided Complex 3D Scene Generation

Fast 3D Indoor Scene Synthesis by LearningSpatial Relation Priors of Objects