BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

Yonghao Yu,Shunan Zhu,Huai Qin,Haorui Li

DOI: https://doi.org/10.24963/ijcai.2024/598

2024-09-18

Abstract:Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement.Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to efficiently generate high - quality 3D assets in the field of text - to - 3D generation. Currently, there are two main methods for generating 3D assets: one is the feed - forward generation method, which can quickly generate 3D assets, but the generated results are usually rather rough; the other is the method based on Score Distillation Sampling (SDS), which can generate high - fidelity 3D assets, but is slow and prone to the so - called "Janus problem" (i.e., the multi - head problem, referring to the phenomenon that 3D models generated from different viewing angles are inconsistent). The paper proposes a new method - BoostDream, aiming to combine the advantages of these two methods to achieve efficient and high - quality 3D asset generation. Specifically, BoostDream achieves this goal through the following three stages: 1. **3D Model Distillation**: Convert the explicitly represented 3D assets obtained through the feed - forward generation method into a differentiable 3D representation form, making it trainable. 2. **Multi - view SDS Loss Design**: Utilize a multi - view - aware 2D diffusion model to refine 3D assets and solve the Janus problem. 3. **Use Prompts and Multi - view - consistent Normal Maps as Guidance**: During the refinement process, use text prompts and multi - view - consistent normal maps to guide the generation process, ensuring that the generated 3D assets are improved in both detail and quality. Through these techniques, BoostDream can significantly improve the quality of 3D assets while maintaining the generation speed, thus solving the trade - off problem between efficiency and quality in existing methods.

BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation

StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Creating High-quality 3D Content by Bridging the Gap Between Text-to-2D and Text-to-3D Generation

Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration

ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

Retrieval-Augmented Score Distillation for Text-to-3D Generation

Magic3D: High-Resolution Text-to-3D Content Creation

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

EucliDreamer: Fast and High-Quality Texturing for 3D Models with Depth-Conditioned Stable Diffusion

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture

MVDream: Multi-view Diffusion for 3D Generation