Multi-Concept Customization of Text-to-Image Diffusion

Nupur Kumari,Bingliang Zhang,Richard Zhang,Eli Shechtman,Jun-Yan Zhu

2023-06-21

Abstract:While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to quickly acquire a new concept, given a few examples? Furthermore, can we compose multiple new concepts together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. We find that only optimizing a few parameters in the text-to-image conditioning mechanism is sufficiently powerful to represent new concepts while enabling fast tuning (~6 minutes). Additionally, we can jointly train for multiple concepts or combine multiple fine-tuned models into one via closed-form constrained optimization. Our fine-tuned model generates variations of multiple new concepts and seamlessly composes them with existing concepts in novel settings. Our method outperforms or performs on par with several baselines and concurrent works in both qualitative and quantitative evaluations while being memory and computationally efficient.

Computer Vision and Pattern Recognition,Graphics,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: how to enable existing text - to - image generation models to quickly learn new concepts and be able to seamlessly combine these new concepts with existing concepts when generating new images. Specifically, the paper focuses on the following challenges: 1. **Model Forgetting**: When adding new concepts to the model, avoid the model forgetting or changing the meaning of the concepts it has already learned. For example, when adding the concept of "moon gate", it should not lead to the loss of the concept of "moon". 2. **Overfitting**: Due to the limited number of training samples for new concepts, the model is prone to overfit these small number of training samples, thus reducing the variety of generated images. 3. **Multi - Concept Combination**: Be able to not only learn new concepts individually, but also combine multiple new concepts together to generate complex scenes. For example, generate an image of a pet dog wearing sunglasses standing in front of a moon gate. To address these challenges, the paper proposes the **Custom Diffusion** method. By optimizing some parameters in the text - to - image generation model (mainly the key - value mapping in the cross - attention layer), it achieves the ability to efficiently learn new concepts and combine them with existing concepts. This method not only performs well in single - concept learning, but also achieves remarkable results in multi - concept combination generation.

Multi-Concept Customization of Text-to-Image Diffusion

Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization?

MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation

Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

Non-confusing Generation of Customized Concepts in Diffusion Models

Customization Assistant for Text-to-image Generation

Learning to Customize Text-to-Image Diffusion In Diverse Context

Create Your World: Lifelong Text-to-Image Diffusion

SPDiffusion: Semantic Protection Diffusion for Multi-concept Text-to-image Generation

Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

Towards Lifelong Few-Shot Customization of Text-to-Image Diffusion

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models

CustomText: Customized Textual Image Generation using Diffusion Models

Scaling Concept With Text-Guided Diffusion Models

Editing Massive Concepts in Text-to-Image Diffusion Models