GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

Trapoom Ukarapol,Kevin Pruvost
2024-06-14
Abstract:Text-to-3D generation has shown promising results, yet common challenges such as the Multi-face Janus problem and extended generation time for high-quality assets. In this paper, we address these issues by introducing a novel three-stage training pipeline called GradeADreamer. This pipeline is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU. Our proposed method employs a Multi-view Diffusion Model, MVDream, to generate Gaussian Splats as a prior, followed by refining geometry and texture using StableDiffusion. Experimental results demonstrate that our approach significantly mitigates the Multi-face Janus problem and achieves the highest average user preference ranking compared to previous state-of-the-art methods. The project code is available at <a class="link-external link-https" href="https://github.com/trapoom555/GradeADreamer" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two common challenges in text - to - 3D generation: the Multi - face Janus problem and the long time required for high - quality asset generation. Specifically: 1. **Multi - face Janus problem**: This refers to the consistency problem between different viewpoints during the text - to - 3D generation process, resulting in the generated 3D objects having unrealistic geometric structures or multiple faces in some viewpoints. This makes the generated 3D models look unnatural and uncoordinated visually. 2. **Excessively long generation time**: Existing text - to - 3D generation methods usually require a long time to generate high - quality 3D assets, which is a bottleneck for practical applications. To address these challenges, the paper proposes a new method named GradeADreamer. This method generates high - quality 3D assets through a three - stage training pipeline, and the total generation time on a single RTX 3090 GPU is no more than 30 minutes. The following are the three main stages proposed in the paper: 1. **Gaussian point cloud prior generation**: Use the multi - view diffusion model (MVDream) to generate the initial Gaussian Splats to reduce the occurrence of the Multi - face Janus problem. 2. **Gaussian point cloud optimization**: Utilize StableDiffusion to refine the Gaussian point cloud to improve the quality of geometric details. 3. **Texture optimization**: Use StableDiffusion for texture optimization on the extracted mesh to ensure that the generated 3D model has high - quality textures. Through these steps, GradeADreamer not only significantly reduces the incidence of the Multi - face Janus problem but also achieves a faster generation time than existing methods and performs well on multiple metrics such as user preference ranking and 3D - FID scores.