Abstract:Layout generation is a foundation task of graphic design, which requires the integration of visual aesthetics and harmonious expression of content delivery. However, existing methods still face challenges in generating precise and visually appealing layouts, including blocking, overlapping, small-sized, or spatial misalignment. We found that these methods overlook the crucial balance between learning content-aware and graphic-aware features. This oversight results in their limited ability to model the graphic structure of layouts and generate reasonable layout arrangements. To address these challenges, we introduce LayoutDiT, an effective framework that balances content and graphic features to generate high-quality, visually appealing layouts. Specifically, we first design an adaptive factor that optimizes the model's awareness of the layout generation space, balancing the model's performance in both content and graphic aspects. Secondly, we introduce a graphic condition, the saliency bounding box, to bridge the modality difference between images in the visual domain and layouts in the geometric parameter domain. In addition, we adapt a diffusion transformer model as the backbone, whose powerful generative capability ensures the quality of layout generation. Benefiting from the properties of diffusion models, our method excels in constrained settings without introducing additional constraint modules. Extensive experimental results demonstrate that our method achieves superior performance in both constrained and unconstrained settings, significantly outperforming existing methods.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenges faced by existing methods in generating accurate and visually appealing layouts in layout generation tasks. Specifically, these challenges include: 1. **Blocking**: The generated layout elements may block key areas in the background image. 2. **Overlapping**: Overlaps may occur between layout elements, affecting the overall aesthetics and readability. 3. **Small - sized**: The generated layout elements may be too small to clearly display information. 4. **Spatial misalignment**: The position and size of layout elements may be inconsistent with the content of the background image. The root cause of these problems lies in the existing methods' failure to well - balance the learning of content - aware and graphic - aware features. Specifically: - **Insufficient content - awareness**: Unable to fully understand the content of the background image, resulting in a layout that is inconsistent with the image. - **Insufficient graphic - structure modeling**: Failing to accurately capture the geometric structure of the layout, resulting in a layout that is visually less reasonable. To solve these problems, the author proposes a new framework - **LayoutDiT**, which improves the layout generation effect by introducing the following innovations: 1. **Content - Graphic Balance Factor (CGBF)**: - Designs an adaptive factor to optimize the model's perception of the layout generation space, thereby achieving a better balance between content and graphics. - This factor is predicted by a trainable module and serves as a dynamic regulator to adjust the interaction between layout representations and image features. 2. **Saliency Bounding Box**: - Extracts geometric information from the saliency map and converts it into the same modality as the layout elements, thereby providing a more accurate basis for geometric feature alignment. - The saliency bounding box effectively extracts key shape information in the image, helping the model better understand the content of the background image. 3. **Diffusion Transformer Architecture**: - Utilizes the powerful generation ability of the diffusion model to ensure the generation of high - quality layouts. - The characteristics of the diffusion model enable this method to work effectively under constraints without introducing additional constraint modules. Through these designs, LayoutDiT can generate high - quality, visually appealing layouts while solving problems such as blocking, overlapping, small - sized, and spatial misalignment in existing methods.

LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

LayoutDM: Transformer-based Diffusion Model for Layout Generation

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models

Unifying Layout Generation with a Decoupled Diffusion Model

DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

LayoutDM: Precision Multi-Scale Diffusion for Layout-to-Image

Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model

Dolfin: Diffusion Layout Transformers without Autoencoder

LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation

PLay: Parametrically Conditioned Layout Generation using Latent Diffusion

Two-stage Content-Aware Layout Generation for Poster Designs

Content-aware generative modeling of graphic design layouts

LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

Attribute-Conditioned Layout GAN for Automatic Graphic Design