Abstract:Thanks to the rapid development of diffusion models, unprecedented progress has been witnessed in image synthesis. Prior works mostly rely on pre-trained linguistic models, but a text is often too abstract to properly specify all the spatial properties of an image, e.g., the layout configuration of a scene, leading to the sub-optimal results of complex scene generation. In this paper, we achieve accurate complex scene generation by proposing a semantically controllable Layout-AWare diffusion model, termed LAW-Diffusion. Distinct from the previous Layout-to-Image generation (L2I) methods that only explore category-aware relationships, LAW-Diffusion introduces a spatial dependency parser to encode the location-aware semantic coherence across objects as a layout embedding and produces a scene with perceptually harmonious object styles and contextual relations. To be specific, we delicately instantiate each object's regional semantics as an object region map and leverage a location-aware cross-object attention module to capture the spatial dependencies among those disentangled representations. We further propose an adaptive guidance schedule for our layout guidance to mitigate the trade-off between the regional semantic alignment and the texture fidelity of generated objects. Moreover, LAW-Diffusion allows for instance reconfiguration while maintaining the other regions in a synthesized image by introducing a layout-aware latent grafting mechanism to recompose its local regional semantics. To better verify the plausibility of generated scenes, we propose a new evaluation metric for the L2I task, dubbed Scene Relation Score (SRS) to measure how the images preserve the rational and harmonious relations among contextual objects. Comprehensive experiments demonstrate that our LAW-Diffusion yields the state-of-the-art generative performance, especially with coherent object relations.

Move Anything with Layered Scene Diffusion

Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

LayerDiffusion: Layered Controlled Image Editing with Diffusion Models

Diffusion-based Generation, Optimization, and Planning in 3D Scenes

Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis

Collage Diffusion

LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

3D Scene Diffusion Guidance using Scene Graphs

Generating Images with 3D Annotations Using Diffusion Models

Mixed Diffusion for 3D Indoor Scene Synthesis

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation

Collaborative Diffusion for Multi-Modal Face Generation and Editing

Joint Generative Modeling of Scene Graphs and Images via Diffusion Models

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

R3CD: Scene Graph to Image Generation with Relation-Aware Compositional Contrastive Control Diffusion

DORSal: Diffusion for Object-centric Representations of Scenes et al