DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping

Zeyu Cai,Duotun Wang,Yixun Liang,Zhijing Shao,Ying-Cong Chen,Xiaohang Zhan,Zeyu Wang
2024-09-20
Abstract:Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings such as over-saturated color and excess smoothness. In this paper, we conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. Following this insight, we introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation. This special design enables the efficient training of variational distribution by skipping the calculations of the Jacobians in the diffusion U-Net. We also introduce timestep-dependent Distribution Coefficient Annealing (DCA) to further improve distilling precision. Leveraging VDM and DCA, we use Gaussian Splatting as the 3D representation and build a text-to-3D generation framework. Extensive experiments and evaluations demonstrate the capability of VDM and DCA to generate high-fidelity and realistic assets with optimization efficiency.
Computer Vision and Pattern Recognition,Graphics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in text - to - 3D generation, existing methods such as Score Distillation Sampling (SDS) have some deficiencies, for example, the generated 3D model has problems such as oversaturated colors and overly smooth surfaces. These problems limit the quality and detail representation of the generated 3D assets. To solve these problems, the author proposes a new strategy - Variational Distribution Mapping (VDM), and the Distribution Coefficient Annealing (DCA) strategy which is time - step - dependent. These new methods aim to improve the quality and efficiency of 3D generation by more effectively modeling the distribution of rendered images, thereby generating high - fidelity, highly realistic 3D assets, and the optimization process is faster. Specifically, the paper regards the rendered image as a degraded form of the image generated by the diffusion model by introducing a trainable degradation process, thus avoiding complex Jacobian matrix calculations in the UNet of the diffusion model. In addition, the paper also analyzes the mode - seeking behavior in SDS and finds that the correlation between the distribution of the generated image and the rendered image weakens as the time step decreases. Based on this observation, the DCA strategy is proposed, which further improves the generation quality by applying time - dependent coefficients to adapt to the dynamic changes of the rendered image distribution. In summary, the goal of this paper is to improve the existing text - to - 3D generation technology so that it can generate more detailed and realistic 3D models while maintaining an efficient optimization process.