InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior

Chenguo Lin,Yuchen Lin,Panwang Pan,Xuanyang Zhang,Yadong Mu
2024-07-11
Abstract:Comprehending natural language instructions is a charming property for both 2D and 3D layout synthesis systems. Existing methods implicitly model object joint distributions and express object relations, hindering generation's controllability. We introduce InstructLayout, a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 2D and 3D layout synthesis. The proposed semantic graph prior learns layout appearances and object distributions simultaneously, demonstrating versatility across various downstream tasks in a zero-shot manner. To facilitate the benchmarking for text-driven 2D and 3D scene synthesis, we respectively curate two high-quality datasets of layout-instruction pairs from public Internet resources with large language and multimodal models. Extensive experimental results reveal that the proposed method outperforms existing state-of-the-art approaches by a large margin in both 2D and 3D layout synthesis tasks. Thorough ablation studies confirm the efficacy of crucial design components.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: how to generate controllable and high - quality 2D and 3D layouts through natural language instructions. Specifically, existing methods are implicit when modeling the joint distribution of objects and expressing object relationships, which hinders the controllability of the generated results. In addition, existing methods have difficulty handling complex natural language instructions, resulting in a lack of precision and diversity in the generated layouts. To solve these problems, the authors propose **INSTRUCT LAYOUT**, a new generation framework that combines semantic graph prior and layout decoder to improve the controllability and fidelity of 2D and 3D layout synthesis. The main contributions of this framework include: 1. **Introducing semantic graph prior**: Learn the layout appearance and object distribution, thereby achieving zero - shot generation in various downstream tasks. 2. **Constructing high - quality datasets**: To promote the benchmarking of text - driven 2D and 3D scene synthesis, the authors have collated two high - quality datasets from public Internet resources, containing a large number of layout - instruction pairs. 3. **Improving the generation model**: Through extensive experimental verification, this method significantly outperforms the existing state - of - the - art methods in 2D and 3D layout synthesis tasks. ### Specific problem description 1. **Understanding natural language instructions**: Existing layout synthesis systems have difficulty accurately understanding natural language instructions, resulting in a lack of controllability in the generated layouts. 2. **Improving generation quality**: Existing methods cannot ensure high fidelity and controllability simultaneously when generating layouts. 3. **Modeling complex object relationships**: Existing methods have difficulty handling complex object relationships, resulting in a lack of diversity and precision in the generated layouts. ### Solution **INSTRUCT LAYOUT** solves the above problems in the following ways: 1. **Semantic graph prior**: Use semantic graph prior to model object attributes and layout distribution, thereby achieving zero - shot generation in various downstream tasks. 2. **Layout decoder**: Generate specific layout configurations through the decoder to ensure that the generated layouts conform to the semantic graph prior and are close to the instructions provided by the user. 3. **Two - stage scheme**: Process discrete and continuous attributes separately, reducing the burden of network optimization and improving the quality and controllability of generation. ### Summary The core problem of this paper is to generate controllable and high - quality 2D and 3D layouts through natural language instructions, and **INSTRUCT LAYOUT** successfully solves the deficiencies of existing methods in controllability and generation quality by introducing semantic graph prior and layout decoder.