Abstract:Text-guided diffusion models have shown superior performance in image/video generation and editing. While few explorations have been performed in 3D scenarios. In this paper, we discuss three fundamental and interesting problems on this topic. First, we equip text-guided diffusion models to achieve 3D-consistent generation. Specifically, we integrate a NeRF-like neural field to generate low-resolution coarse results for a given camera view. Such results can provide 3D priors as condition information for the following diffusion process. During denoising diffusion, we further enhance the 3D consistency by modeling cross-view correspondences with a novel two-stream (corresponding to two different views) asynchronous diffusion process. Second, we study 3D local editing and propose a two-step solution that can generate 360-degree manipulated results by editing an object from a single view. Step 1, we propose to perform 2D local editing by blending the predicted noises. Step 2, we conduct a noise-to-text inversion process that maps 2D blended noises into the view-independent text embedding space. Once the corresponding text embedding is obtained, 360-degree images can be generated. Last but not least, we extend our model to perform one-shot novel view synthesis by fine-tuning on a single image, firstly showing the potential of leveraging text guidance for novel view synthesis. Extensive experiments and various applications show the prowess of our 3DDesigner. The project page is available at <a class="link-external link-https" href="https://3ddesigner-diffusion.github.io/" rel="external noopener nofollow">this https URL</a>.

Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation

Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

Edit-DiffNeRF: Editing 3D Neural Radiance Fields using 2D Diffusion Model

Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration

3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

DreamFusion: Text-to-3D using 2D Diffusion

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion

Text2Control3D: Controllable 3D Avatar Generation in Neural Radiance Fields using Geometry-Guided Text-to-Image Diffusion Model

Magic3D: High-Resolution Text-to-3D Content Creation

Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation