GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

Xiaoyu Zhou,Xingjian Ran,Yajiao Xiong,Jinlin He,Zhiwei Lin,Yongtao Wang,Deqing Sun,Ming-Hsuan Yang

2024-06-11

Abstract:We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. The source codes and models will be available at <a class="link-external link-http" href="http://gala3d.github.io" rel="external noopener nofollow">this http URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the problem of generating complex 3D scenes, particularly generating high-quality and interactively controllable 3D scenes from textual descriptions. Specifically: 1. **Problems with existing methods**: Existing text-to-3D generation models struggle with scenes containing multiple objects and their complex interactions, resulting in low-quality 3D scenes with issues such as geometric distortions and blurry textures. 2. **Proposed method**: The paper proposes the GALA3D framework, which uses Layout-guided Gaussian Splatting to generate high-quality 3D scenes and supports controllable editing functions. GALA3D addresses the above issues through the following approaches: - Utilizing large-scale language models (LLMs) to extract instance relationships from textual descriptions and generate a rough layout. - Introducing adaptive geometric constraints to optimize the shape and distribution of Gaussian ellipsoids for high-quality geometric structures. - Combining diffusion priors and combinatorial optimization strategies to ensure semantic and spatial consistency among multiple objects in the generated scene. - Iteratively refining the rough layout through a layout refinement module to better align it with the generated scene. 3. **Experimental results**: Experiments show that GALA3D outperforms existing NeRF-based, voxel-based, and other Gaussian point cloud-based methods in generating complex 3D scenes with multiple objects, excelling in various metrics. In summary, GALA3D aims to overcome the limitations of existing text-to-3D generation technologies, achieving more realistic and controllable complex scene generation.

GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation

Text-to-3D Using Gaussian Splatting

SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting

GVGEN: Text-to-3D Generation with Volumetric Representation

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint

3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting

ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation

Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph

Urban Architect: Steerable 3D Urban Scene Generation with Layout Prior

DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

Text-to-3D Gaussian Splatting with Physics-Grounded Motion Generation

GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

Comp4D: LLM-Guided Compositional 4D Scene Generation

CC3D: Layout-Conditioned Generation of Compositional 3D Scenes