Abstract:Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer. The code is available at \url{<a class="link-external link-https" href="https://github.com/HansenHuang0823/PlacidDreamer" rel="external noopener nofollow">this https URL</a>}.

What problem does this paper attempt to address?

The paper proposes new solutions to two key issues in the field of text-to-3D generation: 1. **Conflicting Optimization Directions**: In existing methods, different generative models (such as multi-view diffusion models, text-to-image diffusion models, etc.) may produce inconsistent guidance directions, leading to contradictions throughout the generation process. To address this issue, the paper introduces a module called "Latent-Plane," which enhances the multi-view diffusion model and ensures directional consistency throughout the generation process. 2. **Oversaturation Issue**: During the generation process using the score distillation algorithm, the generated content may exhibit excessive color saturation. To solve this problem, the paper treats score distillation as a multi-objective optimization problem and introduces a new algorithm—Balanced Score Distillation (BSD)—to achieve a good balance between rich details and color balance in the generated content. ### Main Contributions - **PlacidDreamer Framework**: This framework integrates initialization, multi-view generation, and text-conditioned generation into one, coordinated by a single multi-view diffusion model. It also employs a novel balanced score distillation algorithm to control saturation. - **Latent-Plane Module**: This module can quickly reconstruct geometric structures and improve the quality of multi-view images, thereby better initializing 3D Gaussian points and personalizing the text-to-image diffusion model. - **Balanced Score Distillation Algorithm**: By treating score distillation as a multi-objective optimization problem, this algorithm dynamically adjusts the optimization direction to achieve a Pareto optimal solution, resulting in outputs that are both richly detailed and reasonably saturated. ### Experimental Validation The paper validates the effectiveness of PlacidDreamer through extensive experiments, including quantitative evaluations and qualitative analyses. The results show that this method outperforms existing techniques on multiple benchmarks, improving generation quality and alignment metrics by at least 5 points. Additionally, integrating the balanced score distillation algorithm into other open-source text-to-3D frameworks significantly enhances their performance. In summary, PlacidDreamer aims to address the current issues in text-to-3D generation by improving the multi-view diffusion model and optimizing the score distillation algorithm, thereby enhancing the overall generation quality.

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

Creating High-quality 3D Content by Bridging the Gap Between Text-to-2D and Text-to-3D Generation

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis

UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

Retrieval-Augmented Score Distillation for Text-to-3D Generation

GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation