DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

Yiming Zhong,Xiaolin Zhang,Yao Zhao,Yunchao Wei

2024-08-09

Abstract:Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In this paper, to address the issue, we propose DreamLCM which incorporates the Latent Consistency Model (LCM). DreamLCM leverages the powerful image generation capabilities inherent in LCM, enabling generating consistent and high-quality guidance, i.e., predicted noises or images. Powered by the improved guidance, the proposed method can provide accurate and detailed gradients to optimize the target 3D models. In addition, we propose two strategies to enhance the generation quality further. Firstly, we propose a guidance calibration strategy, utilizing Euler Solver to calibrate the guidance distribution to accelerate 3D models to converge. Secondly, we propose a dual timestep strategy, increasing the consistency of guidance and optimizing 3D models from geometry to appearance in DreamLCM. Experiments show that DreamLCM achieves state-of-the-art results in both generation quality and training efficiency. The code is available at <a class="link-external link-https" href="https://github.com/1YimingZhong/DreamLCM" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the text - to - 3D generation task, existing methods such as the Score Distillation Sampling (SDS) method have the problem of poor quality when generating 3D objects, mainly manifested as over - smooth. Specifically, the 3D objects generated by the SDS method lack details because: 1. **Low - quality guidance**: The guidance gradients generated by existing diffusion models (such as DDPM) through single - step reasoning are of poor quality, resulting in blurred 3D objects. 2. **Inconsistent guidance**: The randomness of input noise and time steps makes the guidance inconsistent between different iterations, and finally averages the details of 3D content, resulting in over - smooth. To address these problems, the paper proposes the DreamLCM method, which combines the Latent Consistency Model (LCM) and further improves the generation quality through the following two strategies: 1. **Guidance calibration strategy**: Use the Euler Solver to calibrate the guidance distribution to accelerate the convergence of the 3D model. 2. **Two - time - step strategy**: Gradually optimize the 3D model from geometry to appearance by increasing the consistency of guidance. Through these improvements, DreamLCM can generate high - quality 3D objects while maintaining training efficiency. Experimental results show that DreamLCM has reached the state - of - the - art level in both generation quality and training efficiency.

DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Creating High-quality 3D Content by Bridging the Gap Between Text-to-2D and Text-to-3D Generation

Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration

DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping

StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

LucidDreaming: Controllable Object-Centric 3D Generation

DreamPolish: Domain Score Distillation with Progressive Geometry Generation

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

ExactDreamer: High-Fidelity Text-to-3D Content Creation via Exact Score Matching

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation