Abstract:Text-to-3D generation has shown promising results, yet common challenges such as the Multi-face Janus problem and extended generation time for high-quality assets. In this paper, we address these issues by introducing a novel three-stage training pipeline called GradeADreamer. This pipeline is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU. Our proposed method employs a Multi-view Diffusion Model, MVDream, to generate Gaussian Splats as a prior, followed by refining geometry and texture using StableDiffusion. Experimental results demonstrate that our approach significantly mitigates the Multi-face Janus problem and achieves the highest average user preference ranking compared to previous state-of-the-art methods. The project code is available at <a class="link-external link-https" href="https://github.com/trapoom555/GradeADreamer" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are two common challenges in text - to - 3D generation: the Multi - face Janus problem and the long time required for high - quality asset generation. Specifically: 1. **Multi - face Janus problem**: This refers to the consistency problem between different viewpoints during the text - to - 3D generation process, resulting in the generated 3D objects having unrealistic geometric structures or multiple faces in some viewpoints. This makes the generated 3D models look unnatural and uncoordinated visually. 2. **Excessively long generation time**: Existing text - to - 3D generation methods usually require a long time to generate high - quality 3D assets, which is a bottleneck for practical applications. To address these challenges, the paper proposes a new method named GradeADreamer. This method generates high - quality 3D assets through a three - stage training pipeline, and the total generation time on a single RTX 3090 GPU is no more than 30 minutes. The following are the three main stages proposed in the paper: 1. **Gaussian point cloud prior generation**: Use the multi - view diffusion model (MVDream) to generate the initial Gaussian Splats to reduce the occurrence of the Multi - face Janus problem. 2. **Gaussian point cloud optimization**: Utilize StableDiffusion to refine the Gaussian point cloud to improve the quality of geometric details. 3. **Texture optimization**: Use StableDiffusion for texture optimization on the extracted mesh to ensure that the generated 3D model has high - quality textures. Through these steps, GradeADreamer not only significantly reduces the incidence of the Multi - face Janus problem but also achieves a faster generation time than existing methods and performs well on multiple metrics such as user preference ranking and 3D - FID scores.

GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion

VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Creating High-quality 3D Content by Bridging the Gap Between Text-to-2D and Text-to-3D Generation

BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis

Turbo3D: Ultra-fast Text-to-3D Generation

EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior

GVGEN: Text-to-3D Generation with Volumetric Representation

Text-to-3D Using Gaussian Splatting

Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model

X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation

Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation

HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation