Abstract:Existing feed-forward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric prompt images. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generalization ability of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (~6s on an A100 GPU) than SOTA methods. The user study and text-to-3D applications also reveals the practical values of our method. Our Project page at <a class="link-external link-https" href="https://caiyuanhao1998.github.io/project/DiffusionGS/" rel="external noopener nofollow">this https URL</a> shows the video and interactive generation results.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are several key challenges existing in the existing feed - forward image - to - 3D generation methods when dealing with single - view inputs: 1. **3D Consistency Problem**: Existing multi - view diffusion models cannot guarantee 3D consistency during the generation process, resulting in easy collapse when changing the direction of the prompt view. These methods mainly deal with prompt images centered on objects and have insufficient support for complex scenes. 2. **Generation Quality and Speed**: Existing methods have limitations in generation quality and speed, especially when dealing with large - scale scenes. For example, the method based on tri - plane NeRF is difficult to scale to larger scenes due to the slow speed of volume rendering and limited resolution. 3. **Data Generalization Ability**: Current methods are mainly trained using object - centered datasets, which limits the generalization ability of the model, especially the insufficient support for large - scale scene generation. To address these problems, the author proposes a new single - stage 3D Gaussian point - cloud diffusion model (DiffusionGS), which can generate 3D objects and scenes from a single view and has the following characteristics: - **3D Consistency**: By predicting multi - view pixel - aligned Gaussian primitives at each time step, DiffusionGS can enforce view consistency of the generated content, thus enabling robust generation under prompt views in any direction. - **Fast Inference**: Utilizing highly parallel rasterization and a scalable imaging range, the inference speed of DiffusionGS on a single A100 GPU is approximately 6 seconds. - **Hybrid Training Strategy**: To improve the generalization ability and generation quality of the model, the author has developed a scene - object hybrid training strategy to adapt to different types of 3D data by controlling the distribution of selected views, camera conditions, Gaussian point clouds, and imaging depths. - **New Camera Pose Encoding Method**: A new camera pose encoding method - reference point Plücker coordinates (RPPC) - is designed to better perceive depth and 3D geometric structures. Experimental results show that DiffusionGS significantly outperforms existing methods in terms of generation quality and speed, especially with an improvement of 2.20 dB and 23.25 in PSNR and FID metrics respectively, while the inference speed is increased by more than 5 times.

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation

GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction

DiffGS: Functional Gaussian Splatting Diffusion

Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting

3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors

A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision

ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

GET3DGS: Generate 3D Gaussians Based on Points Deformation Fields

AGG: Amortized Generative 3D Gaussians for Single Image to 3D

3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation

Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion

GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting

GaussianPro: 3D Gaussian Splatting with Progressive Propagation

Taming 3DGS: High-Quality Radiance Fields with Limited Resources