AnyScene: Customized Image Synthesis with Composited Foreground

Ruidong Chen,Lanjun Wang,Weizhi Nie,Yongdong Zhang,Anan Liu
DOI: https://doi.org/10.1109/cvpr52733.2024.00833
2024-01-01
Abstract:Recent advancements in text-to-image technology have significantly advanced the field of image customization. Among various applications, the task of customizing diverse scenes for user-specified composited elements holds great application value but has not been extensively explored. Addressing this gap, we propose AnyScene, a specialized framework designed to create varied scenes from compos-ited foreground using textual prompts. AnyScene addresses the primary challenges inherent in existing methods, particularly scene disharmony due to a lack of foreground semantic understanding and distortion of foreground elements. Specifically, we develop a foreground injection module that guides a pretrained diffusion model to generate cohesive scenes in visual harmony with the provided foreground. To enhance robust generation, we implement a layout control strategy that prevents distortions of foreground elements. Furthermore, an efficient image blending mechanism seam-lessly reintegrates foreground details into the generated scenes, producing outputs with overall visual harmony and precise foreground details. In addition, we propose a new benchmark and a series of quantitative metrics to evaluate this proposed image customization task. Extensive experimental results demonstrate the effectiveness of AnyScene, which confirms its potential in various applications.
What problem does this paper attempt to address?