Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

Jiayi Guo,Chaofei Wang,You Wu,Eric Zhang,Kai Wang,Xingqian Xu,Shiji Song,Humphrey Shi,Gao Huang
2023-04-06
Abstract:Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at <a class="link-external link-https" href="https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper primarily aims to address the issue of mode collapse in zero-shot generative model adaptation. Specifically: 1. **Problems with existing methods**: - Current methods like NADA, while efficient in generating images, fall short in terms of image quality and diversity. Particularly, images generated in different target domains often exhibit similar undesirable patterns, known as the mode collapse problem. - The root cause of this issue is that existing methods use a fixed adaptation direction, meaning all cross-domain image pairs share the same supervision signal. 2. **Proposed method**: - To overcome these issues, the authors propose Image-specific Prompt Learning (IPL), which learns specific prompt vectors for each source domain image to produce more precise and diverse adaptation directions. - The IPL method is divided into two stages: the first stage trains a latent mapper to generate image-specific prompt vectors; the second stage uses these prompt vectors to guide the training of the target domain generator, thereby improving the quality and diversity of generated images and mitigating the mode collapse phenomenon. Through this approach, the paper aims to enhance the quality and diversity of generated images and effectively alleviate the mode collapse problem.