Abstract:Generating 3D scenes is a challenging open problem, which requires synthesizing plausible content that is fully consistent in 3D space. While recent methods such as neural radiance fields excel at view synthesis and 3D reconstruction, they cannot synthesize plausible details in unobserved regions since they lack a generative capability. Conversely, existing generative methods are typically not capable of reconstructing detailed, large-scale scenes in the wild, as they use limited-capacity 3D scene representations, require aligned camera poses, or rely on additional regularizers. In this work, we introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes. To achieve this, we make three contributions. First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes, dynamically allocating more capacity as needed to capture details visible in each image. Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images without the need for any additional supervision signal such as masks or depths. This supports 3D reconstruction and generation in a unified architecture. Third, we develop a principled approach to avoid trivial 3D solutions when integrating the image-based rendering with the diffusion model, by dropping out representations of some images. We evaluate the model on several challenging datasets of real and synthetic images, and demonstrate superior results on generation, novel view synthesis and 3D reconstruction.

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

InsertDiffusion: Identity Preserving Visualization of Objects through a Training-Free Diffusion Architecture

Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

Denoising Diffusion via Image-Based Rendering

Anything in Any Scene: Photorealistic Video Object Insertion

RGB$\leftrightarrow$X: Image decomposition and synthesis using material- and lighting-aware diffusion models

Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF from a Single Image

RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting

Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections

Diffusion-Based Depth Inpainting for Transparent and Reflective Objects

Automatic Scene Inference for 3D Object Compositing

Inpaint3D: 3D Scene Content Generation using 2D Inpainting Diffusion

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

PhyIR: Physics-based Inverse Rendering for Panoramic Indoor Images

Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model

iNVS: Repurposing Diffusion Inpainters for Novel View Synthesis

Physically-Based Editing of Indoor Scene Lighting from a Single Image

Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes