Abstract:While 2D diffusion models generate realistic, high-detail images, 3D shape generation methods like Score Distillation Sampling (SDS) built on these 2D diffusion models produce cartoon-like, over-smoothed shapes. To help explain this discrepancy, we show that the image guidance used in Score Distillation can be understood as the velocity field of a 2D denoising generative process, up to the choice of a noise term. In particular, after a change of variables, SDS resembles a high-variance version of Denoising Diffusion Implicit Models (DDIM) with a differently-sampled noise term: SDS introduces noise i.i.d. randomly at each step, while DDIM infers it from the previous noise predictions. This excessive variance can lead to over-smoothing and unrealistic outputs. We show that a better noise approximation can be recovered by inverting DDIM in each SDS update step. This modification makes SDS's generative process for 2D images almost identical to DDIM. In 3D, it removes over-smoothing, preserves higher-frequency detail, and brings the generation quality closer to that of 2D samplers. Experimentally, our method achieves better or similar 3D generation quality compared to other state-of-the-art Score Distillation methods, all without training additional neural networks or multi-view supervision, and providing useful insights into relationship between 2D and 3D asset generation with diffusion models.

What problem does this paper attempt to address?

The paper attempts to address the issue of low-quality 3D shape generation when using 2D diffusion models, specifically manifesting as oversaturated colors and overly smooth textures. While 2D diffusion models can generate high-quality, detailed images, existing 3D generation methods like Score Distillation Sampling (SDS) often produce cartoonish, overly smooth shapes. By analyzing the SDS algorithm, the paper finds that resampling noise at each step leads to high variance in the generation process, thereby affecting the quality. To solve this problem, the paper proposes a new method—Score Distillation via Inversion (SDI), which improves noise estimation by using DDIM reverse inference at each step, significantly enhancing the quality of 3D generation. ### Main Contributions of the Paper: 1. **Theoretical Analysis**: It demonstrates that the guidance of each view in the SDS algorithm can be seen as a simplified reparameterization of DDIM sampling, where SDS randomly samples noise at each step, while DDIM maintains a trajectory consistent with previously predicted noise. 2. **New Method**: Proposes Score Distillation via Inversion (SDI), which significantly improves 3D generation quality by using conditional noise (i.e., noise obtained through DDIM reverse inference) at each step, narrowing the quality gap with 2D model-generated samples. 3. **Experimental Validation**: Systematically compares SDI with existing state-of-the-art Score Distillation algorithms, showing that SDI achieves similar or better results in generation quality without requiring additional neural network training or multi-stage generation. ### Key Innovations of the Paper: - **Improved Noise Estimation**: Obtains conditional noise through DDIM reverse inference instead of resampling random noise at each step, thereby reducing variance in the generation process and improving quality. - **Combination of Theory and Practice**: Not only theoretically explains the reasons for low-quality 3D shapes generated by SDS but also experimentally validates the effectiveness of the proposed SDI method. ### Experimental Results: - **Qualitative Comparison**: The generated 3D shapes are more realistic in detail and texture, avoiding issues of oversaturated colors and overly smooth textures. - **Quantitative Comparison**: Using metrics like CLIP scores and image quality assessments, SDI outperforms or matches existing state-of-the-art methods in multiple aspects, while being competitive in runtime and memory usage. Overall, the paper provides a simple yet effective method to improve the quality of 3D shape generation through in-depth analysis of existing methods, offering new insights for future research.

Score Distillation via Reparametrized DDIM

Diverse Score Distillation

Stable Score Distillation for High-Quality 3D Generation

Rethinking Score Distillation as a Bridge Between Image Distributions

VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation

Score Distillation Sampling with Learned Manifold Corrective

Flow Score Distillation for Diverse Text-to-3D Generation

StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D

A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D

SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

CAD: Photorealistic 3D Generation via Adversarial Distillation

Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

Text-Guided 3D Object Generation Via Disentangled Shape and Appearance Score Distillation Sampling

Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation

Repulsive Latent Score Distillation for Solving Inverse Problems

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts

Enhanced 3D Generation by 2D Editing