DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning

Yuxuan Duan,Yan Hong,Bo Zhang,Jun Lan,Huijia Zhu,Weiqiang Wang,Jianfu Zhang,Li Niu,Liqing Zhang
2024-11-07
Abstract:The recent progress in text-to-image models pretrained on large-scale datasets has enabled us to generate various images as long as we provide a text prompt describing what we want. Nevertheless, the availability of these models is still limited when we expect to generate images that fall into a specific domain either hard to describe or just unseen to the models. In this work, we propose DomainGallery, a few-shot domain-driven image generation method which aims at finetuning pretrained Stable Diffusion on few-shot target datasets in an attribute-centric manner. Specifically, DomainGallery features prior attribute erasure, attribute disentanglement, regularization and enhancement. These techniques are tailored to few-shot domain-driven generation in order to solve key issues that previous works have failed to settle. Extensive experiments are given to validate the superior performance of DomainGallery on a variety of domain-driven generation scenarios. Codes are available at <a class="link-external link-https" href="https://github.com/Ldhlwh/DomainGallery" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper "DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning" aims to address the challenge of generating domain-specific images under few-shot conditions. Specifically, the authors propose a method called DomainGallery, which adjusts a pre-trained Stable Diffusion model on a small target dataset through attribute-centric finetuning to achieve domain-specific image generation. ### Background and Motivation 1. **Limitations of Existing Models**: - Although large-scale pre-trained text-to-image (T2I) models (such as Stable Diffusion) can generate various images, these models perform poorly in specific domains (e.g., artists' hand-drawn sketches) because these domains are difficult to describe in words or the models have never encountered them. - Directly training generative models usually requires a large amount of data, and direct training is not feasible with only a few samples. 2. **Insufficiency of Model Transfer**: - Model transfer methods address this issue by training models on related source domains and then finetuning them on small target datasets. However, the effectiveness of this approach is influenced by the correlation between the source and target domains. If a suitable source dataset cannot be found or there are insufficient resources to train a generative model from scratch, the applicability of these methods is limited. 3. **Advantages and Challenges of T2I Models**: - Pre-trained T2I models can be finetuned as general source models, but existing finetuning work mainly focuses on relatively rich datasets (tens or hundreds of images) or single object/person datasets. Few-shot domain-driven image generation has not been fully explored. ### Solution 1. **Attribute-centric Finetuning**: - DomainGallery addresses key issues through attribute-centric finetuning techniques, including prior attribute erasure, attribute decoupling, attribute regularization, and attribute enhancement. - **Prior Attribute Erasure**: Avoids unexpected prior attributes when using identifier words. - **Attribute Decoupling**: Prevents leakage between domain attributes and category attributes. - **Attribute Regularization**: Reduces overfitting and improves the quality of generated images. - **Attribute Enhancement**: Enhances the strength of domain attributes during cross-category generation, improving the fidelity of generated images. 2. **Application Scenarios**: - **Intra-category Generation**: Generates images containing both target dataset category attributes and domain attributes. - **Cross-category Generation**: Generates images of other categories controlled by text while retaining domain attributes. - **Additional Attributes**: Adds extra attributes when generating images. - **Personalized Generation**: Combines domain-driven and subject-driven generation to achieve better personalization. ### Experimental Results - **Intra-category Generation**: DomainGallery outperforms baseline methods in terms of fidelity and diversity. - **Cross-category Generation**: Through prior attribute erasure and attribute decoupling, DomainGallery avoids attribute leakage issues present in other methods, resulting in higher quality generated images. - **Additional Attributes**: DomainGallery can add extra attributes when generating images without disrupting the original text-image structure. ### Conclusion DomainGallery achieves high-quality domain-specific image generation under few-shot conditions through attribute-centric finetuning techniques, addressing issues in existing methods such as domain and category attribute decoupling and prior attribute erasure. The method performs excellently in various generation scenarios and has broad application prospects.