SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

Zhaoxi Chen,Guangcong Wang,Ziwei Liu

DOI: https://doi.org/10.1109/TPAMI.2023.3321857

2023-12-08

Abstract:In this work, we present SceneDreamer, an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.

Computer Vision and Pattern Recognition,Graphics

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The main goal of this paper is to generate large, diverse 3D landscapes from collections of 2D images gathered in the wild using an unconstrained 3D scene generation model (SceneDreamer). Specifically: 1. **Efficient and Expressive 3D Scene Representation**: Proposes a bird's-eye view (BEV) scene representation method, including height fields and semantic fields, for efficient training and content alignment. 2. **Generative Neural Hash Grid**: Introduces a new semantic-aware neural hash grid to parameterize latent features of spatial variations and scene changes, thereby learning generalizable features across different scenes. 3. **Volume Renderer**: Utilizes a style-modulated volume renderer to generate realistic images, ensuring 3D consistency of the generated images and supporting learning from collections of 2D images in the wild. Through these techniques, SceneDreamer is capable of generating infinite 3D worlds with high detail and diversity, addressing the challenges faced by existing methods in handling large-scale unconstrained scenes.

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Diffusion-based Generation, Optimization, and Planning in 3D Scenes

VividDream: Generating 3D Scene with Ambient Dynamics

Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion

SceneCraft: Layout-Guided 3D Scene Generation

Indoor Scene Generation from a Collection of Semantic-Segmented Depth Images

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Persistent Nature: A Generative Model of Unbounded 3D Worlds