GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

Xiaoyu Zhou,Xingjian Ran,Yajiao Xiong,Jinlin He,Zhiwei Lin,Yongtao Wang,Deqing Sun,Ming-Hsuan Yang
2024-06-11
Abstract:We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. The source codes and models will be available at <a class="link-external link-http" href="http://gala3d.github.io" rel="external noopener nofollow">this http URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of generating complex 3D scenes, particularly generating high-quality and interactively controllable 3D scenes from textual descriptions. Specifically: 1. **Problems with existing methods**: Existing text-to-3D generation models struggle with scenes containing multiple objects and their complex interactions, resulting in low-quality 3D scenes with issues such as geometric distortions and blurry textures. 2. **Proposed method**: The paper proposes the GALA3D framework, which uses Layout-guided Gaussian Splatting to generate high-quality 3D scenes and supports controllable editing functions. GALA3D addresses the above issues through the following approaches: - Utilizing large-scale language models (LLMs) to extract instance relationships from textual descriptions and generate a rough layout. - Introducing adaptive geometric constraints to optimize the shape and distribution of Gaussian ellipsoids for high-quality geometric structures. - Combining diffusion priors and combinatorial optimization strategies to ensure semantic and spatial consistency among multiple objects in the generated scene. - Iteratively refining the rough layout through a layout refinement module to better align it with the generated scene. 3. **Experimental results**: Experiments show that GALA3D outperforms existing NeRF-based, voxel-based, and other Gaussian point cloud-based methods in generating complex 3D scenes with multiple objects, excelling in various metrics. In summary, GALA3D aims to overcome the limitations of existing text-to-3D generation technologies, achieving more realistic and controllable complex scene generation.