HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

Haiyang Zhou,Xinhua Cheng,Wangbo Yu,Yonghong Tian,Li Yuan

2024-07-21

Abstract:3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. Owing to the powerful generative capabilities of text-to-image diffusion models that provide reliable priors, the creation of 3D scenes using only text prompts has become viable, thereby significantly advancing researches in text-driven 3D scene generation. In order to obtain multiple-view supervision from 2D diffusion models, prevailing methods typically employ the diffusion model to generate an initial local image, followed by iteratively outpainting the local image using diffusion models to gradually generate scenes. Nevertheless, these outpainting-based approaches prone to produce global inconsistent scene generation results without high degree of completeness, restricting their broader applications. To tackle these problems, we introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene, then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes. Specifically, we propose Stylized Equirectangular Panorama Generation, a pipeline that combines multiple diffusion models to enable stylized and detailed equirectangular panorama generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to inpaint the missing region and enhance the integrity of the scene. Comprehensive experiments demonstrated that our method outperforms prior works in terms of overall visual consistency and harmony as well as reconstruction quality and rendering robustness when generating fully enclosed scenes.

Computer Vision and Pattern Recognition,Graphics

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper aims to address a series of challenges in generating 3D scenes based on textual descriptions. Specifically: 1. **Limitations of Existing Methods**: - Current methods typically use diffusion models to generate initial local images and then expand them step by step (outpainting) to create a complete scene. This approach often leads to poor global consistency, especially when generating fully enclosed 3D scenes, resulting in visual confusion. 2. **Proposed New Framework HoloDreamer**: - To overcome the above issues, the authors propose the HoloDreamer framework. This framework first generates high-quality panoramic images directly from text prompts as the overall initialization of the 3D scene, and then uses 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, achieving viewpoint-consistent and fully enclosed 3D scene generation. 3. **Main Contributions**: - Proposed Stylized Equirectangular Panorama Generation, combining multiple diffusion models to generate equirectangular panoramas with stylized details. - Introduced the Enhanced Two-Stage Panorama Reconstruction module, which improves the optimization of 3D-GS through multi-view constraints and image inpainting techniques, reducing artifacts and enhancing scene integrity. Through these innovations, HoloDreamer demonstrates better visual consistency, harmony, and rendering robustness in generating fully enclosed 3D scenes, surpassing existing text-driven 3D scene generation methods.

HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions

SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion

DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting

3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

4K4DGen: Panoramic 4D Generation at 4K Resolution

LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling

Taming Stable Diffusion for Text to 360° Panorama Image Generation

SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

Text2Light: Zero-Shot Text-Driven HDR Panorama Generation

HORIZON: A High-Resolution Panorama Synthesis Framework

PanoDreamer: 3D Panorama Synthesis from a Single Image