Layout Generation Agents with Large Language Models

Yuichi Sasazawa,Yasuhiro Sogawa

2024-05-13

Abstract:In recent years, there has been an increasing demand for customizable 3D virtual spaces. Due to the significant human effort required to create these virtual spaces, there is a need for efficiency in virtual space creation. While existing studies have proposed methods for automatically generating layouts such as floor plans and furniture arrangements, these methods only generate text indicating the layout structure based on user instructions, without utilizing the information obtained during the generation process. In this study, we propose an agent-driven layout generation system using the GPT-4V multimodal large language model and validate its effectiveness. Specifically, the language model manipulates agents to sequentially place objects in the virtual space, thus generating layouts that reflect user instructions. Experimental results confirm that our proposed method can generate virtual spaces reflecting user instructions with a high success rate. Additionally, we successfully identified elements contributing to the improvement in behavior generation performance through ablation study.

Human-Computer Interaction,Artificial Intelligence

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the efficiency and customization ability of 3D virtual space layout generation. Specifically, although existing methods can automatically generate layouts, such as house floor plans and furniture arrangements, these methods only generate texts describing the layout structure based on user instructions, without making full use of the information obtained during the generation process, especially the visual information of the virtual space. In addition, most of the existing methods focus on generating specific types of layouts, such as home floor plans and furniture placement, with insufficient consideration for general - purpose applications, and it is difficult to modify the existing space according to the instructions. In response to these problems, the paper proposes an agent - driven layout generation system based on large - language models (LLMs). This system enables the LLM to manipulate agents to place objects one by one in the virtual space to generate layouts that reflect user instructions. The research verifies the effectiveness of the proposed method through experiments, and determines the key factors contributing to the improvement of behavior generation performance through ablation studies.

Layout Generation Agents with Large Language Models

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM

Layout Generation for Various Scenarios in Mobile Shopping Applications.

Large Language Models Understand Layout

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

3D-GPT: Procedural 3D Modeling with Large Language Models

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

LayoutPrompter: Awaken the Design Ability of Large Language Models

VASCAR: Content-Aware Layout Generation via Visual-Aware Self-Correction

PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

Multi-agent Planning using Visual Language Models

Practical PCG Through Large Language Models

APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents

Automatic Generation of Constrained Furniture Layouts

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

VAEnvGen: A Real-Time Virtual Agent Environment Generation System Based on Large Language Models

House-GAN++: Generative Adversarial Layout Refinement Network towards Intelligent Computational Agent for Professional Architects

TextLap: Customizing Language Models for Text-to-Layout Planning

Generative agents in the streets: Exploring the use of Large Language Models (LLMs) in collecting urban perceptions