Geometry-Aware Score Distillation via 3D Consistent Noising and Gradient Consistency Modeling

Min-Seop Kwak,Donghoon Ahn,Ines Hyeonsu Kim,Jin-Hwa Kim,Seungryong Kim
2024-07-01
Abstract:Score distillation sampling (SDS), the methodology in which the score from pretrained 2D diffusion models is distilled into 3D representation, has recently brought significant advancements in text-to-3D generation task. However, this approach is still confronted with critical geometric inconsistency problems such as the Janus problem. Starting from a hypothesis that such inconsistency problems may be induced by multiview inconsistencies between 2D scores predicted from various viewpoints, we introduce GSD, a simple and general plug-and-play framework for incorporating 3D consistency and therefore geometry awareness into the SDS process. Our methodology is composed of three components: 3D consistent noising, designed to produce 3D consistent noise maps that perfectly follow the standard Gaussian distribution, geometry-based gradient warping for identifying correspondences between predicted gradients of different viewpoints, and novel gradient consistency loss to optimize the scene geometry toward producing more consistent gradients. We demonstrate that our method significantly improves performance, successfully addressing the geometric inconsistency problems in text-to-3D generation task with minimal computation cost and being compatible with existing score distillation-based models. Our project page is available at <a class="link-external link-https" href="https://ku-cvlab.github.io/GSD/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The focus of this paper is to address the challenge of geometric inconsistency in text-to-3D generation tasks, particularly the so-called "Janus problem". Existing methods, such as Score Distillation Sampling (SDS) based techniques, optimize 3D representations using pre-trained 2D convolutional models but suffer from geometric inconsistency. The authors propose a new framework called Geometry-aware Score Distillation (GSD) that enhances multi-view consistency by modeling 3D consistency noise and gradient consistency. The GSD framework consists of three components: 3D consistency noise, which generates 3D consistency noise maps following a standard Gaussian distribution; geometry-based gradient distortion, which identifies correspondences between predicted gradients from different views; and a novel gradient consistency loss, which optimizes the scene geometry structure to produce more consistent gradients. This method acts as a plug-in module that can be easily attached to existing SDS base models to improve geometric consistency, without incurring additional computational cost or requiring additional networks or modules. Experimental results demonstrate that GSD significantly improves performance, successfully addressing geometric inconsistency in text-to-3D generation tasks, and enhances view consistency and fidelity in generating 3D scenes. The paper also compares GSD with other SDS methods, demonstrating its effectiveness and highlighting the interrelationships among its components. In conclusion, this paper aims to improve the SDS process in text-driven 3D object generation by incorporating 3D geometric information, resulting in more accurate and consistent 3D models.