Abstract:Diffusion models have been recognized for their ability to generate images that are not only visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) generation has been proposed to leverage region-specific positions and descriptions to enable more precise and controllable generation. However, previous methods primarily focus on UNet-based models (e.g., SD1.5 and SDXL), and limited effort has explored Multimodal Diffusion Transformers (MM-DiTs), which have demonstrated powerful image generation capabilities. Enabling MM-DiT for layout-to-image generation seems straightforward but is challenging due to the complexity of how layout is introduced, integrated, and balanced among multiple modalities. To this end, we explore various network variants to efficiently incorporate layout guidance into MM-DiT, and ultimately present SiamLayout. To Inherit the advantages of MM-DiT, we use a separate set of network weights to process the layout, treating it as equally important as the image and text modalities. Meanwhile, to alleviate the competition among modalities, we decouple the image-layout interaction into a siamese branch alongside the image-text one and fuse them in the later stage. Moreover, we contribute a large-scale layout dataset, named LayoutSAM, which includes 2.7 million image-text pairs and 10.7 million entities. Each entity is annotated with a bounding box and a detailed description. We further construct the LayoutSAM-Eval benchmark as a comprehensive tool for evaluating the L2I generation quality. Finally, we introduce the Layout Designer, which taps into the potential of large language models in layout planning, transforming them into experts in layout generation and optimization. Our code, model, and dataset will be available at <a class="link-external link-https" href="https://creatilayout.github.io" rel="external noopener nofollow">this https URL</a>.

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

LayoutDM: Transformer-based Diffusion Model for Layout Generation

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models

DiffX: Guide Your Layout to Cross-Modal Generative Modeling

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation

LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer

Unifying Layout Generation with a Decoupled Diffusion Model

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

Continuous Layout Editing of Single Images with Diffusion Models

Spatial-Aware Latent Initialization for Controllable Image Generation

DogLayout: Denoising Diffusion GAN for Discrete and Continuous Layout Generation

Training-free Composite Scene Generation for Layout-to-Image Synthesis

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

Layout2image: Image Generation from Layout

Enhancing Image Layout Control with Loss-Guided Diffusion Models

Obtaining Favorable Layouts for Multiple Object Generation

Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts