Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations

Haoming Liu,Yuanhe Guo,Shengjie Wang,Hongyi Wen
DOI: https://doi.org/10.48550/arXiv.2312.08873
2024-09-09
Abstract:Diffusion models, capable of high-quality image generation, receive unparalleled popularity for their ease of extension. Active users have created a massive collection of domain-specific diffusion models by fine-tuning base models on self-collected datasets. Recent work has focused on improving a single diffusion model by uncovering semantic and visual information encoded in various architecture components. However, those methods overlook the vastly available set of fine-tuned diffusion models and, therefore, miss the opportunity to utilize their combined capacity for novel generation. In this work, we propose Diffusion Cocktail (Ditail), a training-free method that transfers style and content information between multiple diffusion models. This allows us to perform diversified generations using a set of diffusion models, resulting in novel images unobtainable by a single model. Ditail also offers fine-grained control of the generation process, which enables flexible manipulations of styles and contents. With these properties, Ditail excels in numerous applications, including style transfer guided by diffusion models, novel-style image generation, and image manipulation via prompts or collage inputs.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use multiple diffusion models (DMs) in specific fields for diverse image generation. Specifically, most of the existing research focuses on improving a single diffusion model, while ignoring the potential of achieving richer and more diverse image generation by combining multiple fine - tuned diffusion models. The paper proposes a method named Diffusion Cocktail (Ditail for short), which is a training - free method that can transfer style and content information among multiple diffusion models, so as to use a set of diffusion models for diverse generation and generate novel images that cannot be obtained by a single model. Ditail also provides fine - grained control over the generation process, making it possible to flexibly manipulate style and content. In addition, Ditail is applicable to a variety of application scenarios, including style transfer guided by diffusion models, image generation in novel styles, and image manipulation through text or collage inputs.