PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Shuo Huang,Shikun Sun,Zixuan Wang,Xiaoyu Qin,Yanmin Xiong,Yuan Zhang,Pengfei Wan,Di Zhang,Jia Jia
2024-07-19
Abstract:Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer. The code is available at \url{<a class="link-external link-https" href="https://github.com/HansenHuang0823/PlacidDreamer" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper proposes new solutions to two key issues in the field of text-to-3D generation: 1. **Conflicting Optimization Directions**: In existing methods, different generative models (such as multi-view diffusion models, text-to-image diffusion models, etc.) may produce inconsistent guidance directions, leading to contradictions throughout the generation process. To address this issue, the paper introduces a module called "Latent-Plane," which enhances the multi-view diffusion model and ensures directional consistency throughout the generation process. 2. **Oversaturation Issue**: During the generation process using the score distillation algorithm, the generated content may exhibit excessive color saturation. To solve this problem, the paper treats score distillation as a multi-objective optimization problem and introduces a new algorithm—Balanced Score Distillation (BSD)—to achieve a good balance between rich details and color balance in the generated content. ### Main Contributions - **PlacidDreamer Framework**: This framework integrates initialization, multi-view generation, and text-conditioned generation into one, coordinated by a single multi-view diffusion model. It also employs a novel balanced score distillation algorithm to control saturation. - **Latent-Plane Module**: This module can quickly reconstruct geometric structures and improve the quality of multi-view images, thereby better initializing 3D Gaussian points and personalizing the text-to-image diffusion model. - **Balanced Score Distillation Algorithm**: By treating score distillation as a multi-objective optimization problem, this algorithm dynamically adjusts the optimization direction to achieve a Pareto optimal solution, resulting in outputs that are both richly detailed and reasonably saturated. ### Experimental Validation The paper validates the effectiveness of PlacidDreamer through extensive experiments, including quantitative evaluations and qualitative analyses. The results show that this method outperforms existing techniques on multiple benchmarks, improving generation quality and alignment metrics by at least 5 points. Additionally, integrating the balanced score distillation algorithm into other open-source text-to-3D frameworks significantly enhances their performance. In summary, PlacidDreamer aims to address the current issues in text-to-3D generation by improving the multi-view diffusion model and optimizing the score distillation algorithm, thereby enhancing the overall generation quality.