DiffGS: Functional Gaussian Splatting Diffusion

Junsheng Zhou,Weiqi Zhang,Yu-Shen Liu
2024-10-30
Abstract:3D Gaussian Splatting (3DGS) has shown convincing performance in rendering speed and fidelity, yet the generation of Gaussian Splatting remains a challenge due to its discreteness and unstructured nature. In this work, we propose DiffGS, a general Gaussian generator based on latent diffusion models. DiffGS is a powerful and efficient 3D generative model which is capable of generating Gaussian primitives at arbitrary numbers for high-fidelity rendering with rasterization. The key insight is to represent Gaussian Splatting in a disentangled manner via three novel functions to model Gaussian probabilities, colors and transforms. Through the novel disentanglement of 3DGS, we represent the discrete and unstructured 3DGS with continuous Gaussian Splatting functions, where we then train a latent diffusion model with the target of generating these Gaussian Splatting functions both unconditionally and conditionally. Meanwhile, we introduce a discretization algorithm to extract Gaussians at arbitrary numbers from the generated functions via octree-guided sampling and optimization. We explore DiffGS for various tasks, including unconditional generation, conditional generation from text, image, and partial 3DGS, as well as Point-to-Gaussian generation. We believe that DiffGS provides a new direction for flexibly modeling and generating Gaussian Splatting.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to address the challenges in generating 3D Gaussian Splatting (3DGS). Specifically, while 3DGS performs well in terms of rendering speed and fidelity, generating high-quality 3D Gaussian splats remains very difficult due to its discrete and unstructured nature. Existing methods are either computationally expensive or limited in the number of Gaussians they can generate, often leading to information loss. To overcome these challenges, the authors propose DiffGS, a general 3D Gaussian generator based on a latent diffusion model. DiffGS achieves a continuous representation of discrete and unstructured 3DGS by representing 3D Gaussian splats as three novel functions (Gaussian probability function, Gaussian color function, and Gaussian transformation function). These functions are then generated by training a latent diffusion model, and an octree-guided sampling and optimization algorithm is introduced to extract any number of Gaussian splats from the generated functions. ### Main Contributions 1. **Efficient Generation**: DiffGS, designed based on Gaussian splatting and latent diffusion models, demonstrates significant efficiency in model training, inference, and shape rendering. 2. **Generality and Quality**: DiffGS generates native 3DGS without the need for voxelization, thus maintaining high quality and generality. 3. **Scalability**: DiffGS can generate any number of Gaussian splats. 4. **Multiple Tasks**: DiffGS performs well in various tasks, including unconditional generation, text-conditional generation, image-conditional generation, partial 3DGS-conditional generation, and point cloud to Gaussian splatting generation. ### Method Overview 1. **3D Gaussian Splatting Representation**: Representing 3D Gaussian splats through three functions (Gaussian probability function, Gaussian color function, and Gaussian transformation function). 2. **Gaussian Variational Autoencoder and Latent Diffusion Model**: Using a Gaussian variational autoencoder to compress Gaussian splatting functions and a latent diffusion model to generate new Gaussian splatting functions. 3. **Gaussian Extraction Algorithm**: Extracting Gaussian splats from the generated functions using an octree-guided sampling and optimization algorithm. ### Experimental Results 1. **Unconditional Generation**: On the ShapeNet dataset, DiffGS outperforms existing GAN and diffusion model methods in terms of the quality of generated shape renderings. 2. **Conditional Generation**: DiffGS shows strong capabilities in text-conditional generation, image-conditional generation, and partial 3DGS-conditional generation tasks, accurately recovering the semantics and geometric structures described by text and images. 3. **Point Cloud to Gaussian Splatting Generation**: DiffGS can generate high-quality Gaussian splats from 3D point clouds, providing an effective solution for converting between point clouds and 3DGS representations. In summary, DiffGS offers a new direction for efficiently and high-quality generating 3D Gaussian splats and demonstrates excellent performance across multiple tasks.