RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Jaidev Shriram,Alex Trevithick,Lingjie Liu,Ravi Ramamoorthi
2024-04-11
Abstract:We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image.
Computer Vision and Pattern Recognition,Artificial Intelligence,Graphics,Machine Learning
What problem does this paper attempt to address?
The paper proposes a solution to the problem of generating 3D scenes from textual descriptions. Existing methods have limitations such as requiring multi-view data, being applicable only to simple objects or panoramic images. RealmDreamer achieves high-quality, depth-aware, and geometrically accurate 3D scene generation by utilizing pre-trained 2D inpainting and depth diffusion models to optimize the 3D Gaussian scatter representation, without relying on videos or multi-view data.