Improving image synthesis with diffusion-negative sampling

Alakh Desai,Nuno Vasconcelos
2024-11-08
Abstract:For image generation with diffusion models (DMs), a negative prompt n can be used to complement the text prompt p, helping define properties not desired in the synthesized image. While this improves prompt adherence and image quality, finding good negative prompts is challenging. We argue that this is due to a semantic gap between humans and DMs, which makes good negative prompts for DMs appear unintuitive to humans. To bridge this gap, we propose a new diffusion-negative prompting (DNP) strategy. DNP is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (DNS). Given p, one such image is sampled, which is then translated into natural language by the user or a captioning model, to produce the negative prompt n*. The pair (p, n*) is finally used to prompt the DM. DNS is straightforward to implement and requires no training. Experiments and human evaluations show that DNP performs well both quantitatively and qualitatively and can be easily combined with several DM variants.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how to improve the quality and consistency of image synthesis through negative prompting when using diffusion models (Diffusion Models, DMs) for image generation. Specifically: 1. **Challenges of negative prompting**: Although negative prompting can help define the attributes that are not desired to appear in the synthesized image, thereby improving the image quality and consistency, it is very difficult to find appropriate negative prompts. This is because there is a semantic gap between human users and diffusion models, making the negative prompts that are intuitive to humans may not be effective in diffusion models. 2. **The problem of semantic gap**: Diffusion models have a different understanding of concept negation from humans, resulting in that the negative prompts considered appropriate by humans may not perform well in the model. For example, for a positive prompt describing "an airplane standing on the runway", negative prompts such as "flying" or "soaring" may not significantly improve the quality of the synthesized image in some cases. 3. **The proposed new method**: To solve the above problems, the author introduced a new negative prompting strategy - Diffusion - Negative Prompting (DNP). DNP is based on a new procedure called Diffusion - Negative Sampling (DNS). By sampling the images that least match the positive prompt and converting them into natural - language negative prompts, it helps users generate more effective negative prompts. 4. **Experimental verification**: The paper proves through experiments that DNP can not only improve the performance of quantitative indicators (such as CLIP scores), but also obtain better image quality and consistency in subjective evaluations. In addition, DNP can be combined with multiple diffusion model variants, and can be automated (auto - DNP) through a pre - trained image caption generation model, further simplifying the user's operation process. In summary, the main contribution of this paper is to propose a new negative prompting strategy DNP to overcome the semantic gap between human users and diffusion models, thereby significantly improving the quality and consistency of image synthesis.