GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise

Xinhai Li,Huaibin Wang,Kuo-Kun Tseng
2023-11-19
Abstract:Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the amalgamation of Nerf and 2D diffusion models frequently yields oversaturated images, posing severe limitations on downstream industrial applications due to the constraints of pixelwise rendering method. Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the following issues: 1. **Multi-view Geometric Consistency Problem**: Current methods face challenges in maintaining consistency in multi-view rendering when learning 3D content from 2D diffusion models. This often leads to multi-view geometric consistency issues, especially when generating multi-view 2D diffusion using given text prompts. 2. **Rendering Speed Limitation**: NeRF (Neural Radiance Fields) and its derivative methods rely on point querying and rendering, resulting in a slow rendering process. This limits the development of related applications, making it particularly difficult to scale 3D generation frameworks in practical projects. 3. **Local Extremum Problem**: Content generation based on pure 3D Gaussian models tends to trap the model in local extrema, leading to artifacts such as floating points, glitches, or proliferative elements. To address these issues, the paper proposes the following contributions: - **A Novel Text-to-3D Framework Based on Gaussian Rendering**: For the first time, a new framework combining a Gaussian rendering pipeline and a fractional function of the Langevin dynamics diffusion model is proposed—GaussianDiffusion. This method significantly accelerates the rendering process and can generate the most realistic text-to-3D appearances to date. - **Structured Noise**: Structured noise from different perspectives is introduced to address the multi-view geometric consistency problem. By using noise injection methods, especially for multi-faceted structural issues, inherent noise constraints between images are achieved. - **Variational Gaussian Rendering Model**: To resolve the contradiction between accurate Gaussian graphic modeling and the instability of 2D diffusion models under multi-view conditions, the paper introduces a variational Gaussian rendering model. This mitigates the risk of the 3D Gaussian model converging to local extrema, thereby reducing artifacts such as floating points, glitches, or proliferative elements. In summary, through these innovations, the paper addresses key challenges in current text-to-3D generation technologies, including multi-view geometric consistency, rendering speed, and model stability.