Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

Jiayi Guo,Chaofei Wang,You Wu,Eric Zhang,Kai Wang,Xingqian Xu,Shiji Song,Humphrey Shi,Gao Huang

2023-04-06

Abstract:Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at <a class="link-external link-https" href="https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily aims to address the issue of mode collapse in zero-shot generative model adaptation. Specifically: 1. **Problems with existing methods**: - Current methods like NADA, while efficient in generating images, fall short in terms of image quality and diversity. Particularly, images generated in different target domains often exhibit similar undesirable patterns, known as the mode collapse problem. - The root cause of this issue is that existing methods use a fixed adaptation direction, meaning all cross-domain image pairs share the same supervision signal. 2. **Proposed method**: - To overcome these issues, the authors propose Image-specific Prompt Learning (IPL), which learns specific prompt vectors for each source domain image to produce more precise and diverse adaptation directions. - The IPL method is divided into two stages: the first stage trains a latent mapper to generate image-specific prompt vectors; the second stage uses these prompt vectors to guide the training of the target domain generator, thereby improving the quality and diversity of generated images and mitigating the mode collapse phenomenon. Through this approach, the paper aims to enhance the quality and diversity of generated images and effectively alleviate the mode collapse problem.

Zero-shot Generative Model Adaptation via Image-specific Prompt Learning

Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation

Improving Zero-Shot Generalization for CLIP with Synthesized Prompts

Few-shot Generative Model Adaptation via Style-Guided Prompt

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

One-shot Generative Domain Adaptation in 3D GANs

Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting

PØDA: Prompt-driven Zero-shot Domain Adaptation

Zero-Shot Learning with Generative Latent Prototype Model.

Diverse and Tailored Image Generation for Zero-shot Multi-label Classification

A Joint Generative Model For Zero-Shot Learning

One-Shot Adaptation of GAN in Just One CLIP

Towards Diverse and Faithful One-shot Adaption of Generative Adversarial Networks

Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Cross-modal propagation network for generalized zero-shot learning

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

No Token Left Behind: Explainability-Aided Image Classification and Generation

EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance

Domain Re-Modulation for Few-Shot Generative Domain Adaptation