OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

Zhe Kong,Yong Zhang,Tianyu Yang,Tao Wang,Kaihao Zhang,Bizhu Wu,Guanying Chen,Wei Liu,Wenhan Luo
2024-07-20
Abstract:Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts within a single image. We propose a novel two-stage sampling solution. The first stage takes charge of layout generation and visual comprehension information collection for handling occlusions. The second one utilizes the acquired visual comprehension information and the designed noise blending to integrate multiple concepts while considering occlusions. We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout. Moreover, our method can be combined with various single-concept models, such as LoRA and InstantID without additional tuning. Especially, LoRA models on <a class="link-external link-http" href="http://civitai.com" rel="external noopener nofollow">this http URL</a> can be exploited directly. Extensive experiments demonstrate that OMG exhibits superior performance in multi-concept personalization.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the challenges of personalized generation in the text-to-image generation field, particularly the issue of multi-concept personalized generation. Specifically, the paper proposes solutions to the following key problems: 1. **Identity Preservation**: Current methods struggle to maintain the uniqueness of each concept when dealing with multiple concepts, leading to identity degradation. 2. **Occlusion Handling**: When different concepts occlude each other in the generated image, existing methods have difficulty properly handling this occlusion, resulting in layout conflicts and identity degradation. 3. **Foreground-Background Harmony**: In multi-concept personalized generation, there is often an unnatural lighting difference between the foreground objects and the background, affecting the overall visual effect. 4. **Efficiency**: Some existing methods require additional training or model optimization to merge multiple concepts, increasing the demand for time and computational resources. To address the above issues, the paper proposes a new framework called OMG (Occlusion-friendly Multi-concept Generation). This framework has the following features: - **Two-Stage Sampling Strategy**: The first stage is responsible for generating a base image with coherent layout and collecting visual understanding information; the second stage uses this information for multi-concept personalized generation. - **Concept Noise Blending**: By blending noise from different single-concept models in the latent space and attention layers, it effectively alleviates the identity degradation problem in multi-concept generation. This method does not require additional model training or tuning and can be easily combined with other personalization frameworks (such as LoRA and InstantID). - **Layout Protection Mechanism**: By retaining cross-attention maps during the generation process, it ensures that the layout of the generated image remains consistent with the base image, thus solving the occlusion problem. - **Concept Quantity Scalability**: The proposed OMG method can maintain good performance as the number of concepts increases. Through a series of experiments, including quantitative analysis and qualitative comparison, the paper demonstrates that OMG outperforms other existing methods in multi-concept personalized generation, particularly in terms of identity preservation, occlusion friendliness, and image harmony. Additionally, OMG also shows its advantages in single-concept personalization, especially in terms of color naturalness, surpassing methods like InstantID.