DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

Yiming Zhong,Xiaolin Zhang,Yao Zhao,Yunchao Wei
2024-08-09
Abstract:Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In this paper, to address the issue, we propose DreamLCM which incorporates the Latent Consistency Model (LCM). DreamLCM leverages the powerful image generation capabilities inherent in LCM, enabling generating consistent and high-quality guidance, i.e., predicted noises or images. Powered by the improved guidance, the proposed method can provide accurate and detailed gradients to optimize the target 3D models. In addition, we propose two strategies to enhance the generation quality further. Firstly, we propose a guidance calibration strategy, utilizing Euler Solver to calibrate the guidance distribution to accelerate 3D models to converge. Secondly, we propose a dual timestep strategy, increasing the consistency of guidance and optimizing 3D models from geometry to appearance in DreamLCM. Experiments show that DreamLCM achieves state-of-the-art results in both generation quality and training efficiency. The code is available at <a class="link-external link-https" href="https://github.com/1YimingZhong/DreamLCM" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the text - to - 3D generation task, existing methods such as the Score Distillation Sampling (SDS) method have the problem of poor quality when generating 3D objects, mainly manifested as over - smooth. Specifically, the 3D objects generated by the SDS method lack details because: 1. **Low - quality guidance**: The guidance gradients generated by existing diffusion models (such as DDPM) through single - step reasoning are of poor quality, resulting in blurred 3D objects. 2. **Inconsistent guidance**: The randomness of input noise and time steps makes the guidance inconsistent between different iterations, and finally averages the details of 3D content, resulting in over - smooth. To address these problems, the paper proposes the DreamLCM method, which combines the Latent Consistency Model (LCM) and further improves the generation quality through the following two strategies: 1. **Guidance calibration strategy**: Use the Euler Solver to calibrate the guidance distribution to accelerate the convergence of the 3D model. 2. **Two - time - step strategy**: Gradually optimize the 3D model from geometry to appearance by increasing the consistency of guidance. Through these improvements, DreamLCM can generate high - quality 3D objects while maintaining training efficiency. Experimental results show that DreamLCM has reached the state - of - the - art level in both generation quality and training efficiency.