Abstract:Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies suffer from two critical yet challenging problems: 1) the final shapes are still similar to the initial ones even after training; 2) shapes can be produced only from simple texts, e.g., "a dog", not for lexically richer texts, e.g., "a dog is sitting on the top of the airplane". To address these problems, this paper proposes a novel general framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes while enabling the spatial interaction among the 3D Gaussians and semantic interaction between Gaussians and texts. Specifically, we first construct a voxelized representation, where each voxel holds a 3D Gaussian with its position, scale, and rotation fixed while setting opacity as the sole factor to determine a position's occupancy. We then design an initialization network mainly consisting of two novel components: 1) Global Information Perception (GIP) block and 2) Gaussians-Text Fusion (GTF) block. Such a design enables each 3D Gaussian to assimilate the spatial information from other areas and semantic information from texts. Extensive experiments show the superiority of our framework of high-quality 3D GS initialization against the existing methods, e.g., Shap-E, by taking lexically simple, medium, and hard texts. Also, our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.

Text-to-3D Using Gaussian Splatting

GVGEN: Text-to-3D Generation with Volumetric Representation

Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness

BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

GET3DGS: Generate 3D Gaussians Based on Points Deformation Fields

Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation

GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models

GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise