Score Distillation via Reparametrized DDIM

Artem Lukoianov,Haitz Sáez de Ocáriz Borde,Kristjan Greenewald,Vitor Campagnolo Guizilini,Timur Bagautdinov,Vincent Sitzmann,Justin Solomon
2024-10-10
Abstract:While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, we show that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice of a noise term. In particular, after a change of variables, SDS resembles a high-variance version of Denoising Diffusion Implicit Models (DDIM) with a differently-sampled noise term: SDS introduces noise i.i.d. randomly at each step, while DDIM infers it from the previous noise predictions. This excessive variance can lead to over-smoothing and unrealistic outputs. We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step. This modification makes SDS's generative process for 2D images almost identical to DDIM. In 3D, it removes over-smoothing, preserves higher-frequency detail, and brings the generation quality closer to that of 2D samplers. Experimentally, our method achieves better or similar 3D generation quality compared to other state-of-the-art Score Distillation methods, all without training additional neural networks or multi-view supervision, and providing useful insights into relationship between 2D and 3D asset generation with diffusion models.
Computer Vision and Pattern Recognition,Graphics,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the issue of low-quality 3D shape generation when using 2D diffusion models, specifically manifesting as oversaturated colors and overly smooth textures. While 2D diffusion models can generate high-quality, detailed images, existing 3D generation methods like Score Distillation Sampling (SDS) often produce cartoonish, overly smooth shapes. By analyzing the SDS algorithm, the paper finds that resampling noise at each step leads to high variance in the generation process, thereby affecting the quality. To solve this problem, the paper proposes a new method—Score Distillation via Inversion (SDI), which improves noise estimation by using DDIM reverse inference at each step, significantly enhancing the quality of 3D generation. ### Main Contributions of the Paper: 1. **Theoretical Analysis**: It demonstrates that the guidance of each view in the SDS algorithm can be seen as a simplified reparameterization of DDIM sampling, where SDS randomly samples noise at each step, while DDIM maintains a trajectory consistent with previously predicted noise. 2. **New Method**: Proposes Score Distillation via Inversion (SDI), which significantly improves 3D generation quality by using conditional noise (i.e., noise obtained through DDIM reverse inference) at each step, narrowing the quality gap with 2D model-generated samples. 3. **Experimental Validation**: Systematically compares SDI with existing state-of-the-art Score Distillation algorithms, showing that SDI achieves similar or better results in generation quality without requiring additional neural network training or multi-stage generation. ### Key Innovations of the Paper: - **Improved Noise Estimation**: Obtains conditional noise through DDIM reverse inference instead of resampling random noise at each step, thereby reducing variance in the generation process and improving quality. - **Combination of Theory and Practice**: Not only theoretically explains the reasons for low-quality 3D shapes generated by SDS but also experimentally validates the effectiveness of the proposed SDI method. ### Experimental Results: - **Qualitative Comparison**: The generated 3D shapes are more realistic in detail and texture, avoiding issues of oversaturated colors and overly smooth textures. - **Quantitative Comparison**: Using metrics like CLIP scores and image quality assessments, SDI outperforms or matches existing state-of-the-art methods in multiple aspects, while being competitive in runtime and memory usage. Overall, the paper provides a simple yet effective method to improve the quality of 3D shape generation through in-depth analysis of existing methods, offering new insights for future research.