Abstract:In recent years, advancements in AIGC (Artificial Intelligence Generated Content) technology have significantly enhanced the capabilities of large text-to-image models. Despite these improvements, controllable image generation remains a challenge. Current methods, such as training, forward guidance, and backward guidance, have notable limitations. The first two approaches either demand substantial computational resources or produce subpar results. The third approach depends on phenomena specific to certain model architectures, complicating its application to large-scale image <a class="link-external link-http" href="http://generation.To" rel="external noopener nofollow">this http URL</a> address these issues, we propose a novel controllable generation framework that offers a generalized interpretation of backward guidance without relying on specific assumptions. Leveraging this framework, we introduce LSReGen, a large-scale layout-to-image method designed to generate high-quality, layout-compliant images. Experimental results show that LSReGen outperforms existing methods in the large-scale layout-to-image task, underscoring the effectiveness of our proposed framework. Our code and models will be open-sourced.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is the challenges faced by current controllable image generation methods when generating large - scale, high - quality images that meet layout requirements. Specifically, the paper points out the limitations of existing methods (such as model training, forward guidance, and backward guidance): 1. **Model Training**: Although this method can obtain excellent generation control capabilities, it requires a large amount of computing resources, especially for models with a large number of parameters and large - scale datasets. 2. **Forward Guidance**: It hardly requires additional computing overhead, but the quality of the generated images is not ideal, for example, mottling may occur. 3. **Backward Guidance**: This method updates the intermediate variables in the denoising process through back - propagation and can obtain good results with relatively small overhead during the inference stage. However, most backward - guidance methods rely on the cross - attention map phenomenon in specific model architectures, which limits their application in large - scale image generation. To solve these problems, the paper proposes a new controllable generation framework, providing a general backward - guidance interpretation without relying on specific assumptions or model - architecture features. Based on this framework, the authors introduce LSReGen, a method for generating high - quality, large - scale images that meet layout requirements. The experimental results show that LSReGen outperforms existing methods in large - scale layout - to - image tasks, verifying the effectiveness of the proposed framework. ### Main Contributions 1. **General Backward - Guidance Framework**: Provides a general backward - guidance framework without training, which can provide a general explanation for backward - guidance without relying on cross - attention maps. 2. **Large - Scale Layout - to - Image Method**: Based on the above framework, LSReGen is proposed, which can generate high - quality, large - scale images that meet layout requirements. 3. **Experimental Verification**: The experimental results show that LSReGen outperforms existing methods in large - scale layout - to - image tasks, further verifying the effectiveness of the proposed framework. ### Method Overview - **Backward - Guidance Framework**: By defining feature extraction methods and distance calculation functions, taking control information as input, and gradually updating intermediate variables to make them gradually approach the target features. - **Large - Scale Region Generator**: Utilize a pre - trained low - parameter layout - to - image model (such as GLIGEN) as a feature extractor, capture layout features by up - sampling and adding noise, and use the square of the L2 norm to calculate the distance between features during the generation process. In conclusion, this paper aims to overcome the limitations of existing controllable image generation methods, especially in generating large - scale, high - quality images that meet layout requirements, by proposing a new backward - guidance framework and the corresponding generation method.

LSReGen: Large-Scale Regional Generator via Backward Guidance Framework

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.

Single Remote Sensing Image Super-Resolution Via a Generative Adversarial Network with Stratified Dense Sampling and Chain Training

LLMGA: Multimodal Large Language Model based Generation Assistant

RealtimeGen: an Intervenable AI Image Generation System for Commercial Digital Art Asset Creators

GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation

Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis

Image generation step by step: animation generation-image translation

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Training-Free Layout Control with Cross-Attention Guidance

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

Unified Text-to-Image Generation and Retrieval

Generative Active Learning for Long-tailed Instance Segmentation

IRGen: Generative Modeling for Image Retrieval

OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks