Intelligent Painter: Picture Composition With Resampling Diffusion Model

Wing-Fung Ku,Wan-Chi Siu,Xi Cheng,H. Anthony Chan
DOI: https://doi.org/10.48550/arXiv.2210.17106
2023-07-04
Abstract:Have you ever thought that you can be an intelligent painter? This means that you can paint a picture with a few expected objects in mind, or with a desirable scene. This is different from normal inpainting approaches for which the location of specific objects cannot be determined. In this paper, we present an intelligent painter that generate a person's imaginary scene in one go, given explicit hints. We propose a resampling strategy for Denoising Diffusion Probabilistic Model (DDPM) to intelligently compose unconditional harmonized pictures according to the input subjects at specific locations. By exploiting the diffusion property, we resample efficiently to produce realistic pictures. Experimental results show that our resampling method favors the semantic meaning of the generated output efficiently and generates less blurry output. Quantitative analysis of image quality assessment shows that our method produces higher perceptual quality images compared with the state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **controllable image synthesis**, especially how to generate high - quality, semantically consistent images based on key elements (such as specific objects or scenes) provided by users. Specifically, the author proposes a new method that enables users to generate an ideal scene graph by inputting some explicit prompts (for example, the positions of certain objects). ### Main Problem Description 1. **Limitations of Traditional Image Inpainting and Generation Methods**: - Traditional methods such as GAN - based models (iGAN, GANBrush, PoE - GAN, etc.) rely on image priors, which limit the types of user input. - Early image inpainting methods (such as those based on large - database matching or neighboring - pixel interpolation) can only handle small - scale missing regions and have limited effectiveness. - Although deep - learning methods can handle large - scale missing regions, they still face challenges in maintaining semantic consistency and generating high - quality images. 2. **Gap between User Requirements and Existing Technologies**: - Users hope to be able to generate a complete, high - quality image just by providing some key elements or scene concepts, like an "intelligent painter". - Existing methods find it difficult to efficiently generate images that match user intentions while maintaining image quality. ### Solutions To solve the above problems, the author proposes the following innovations: 1. **Introducing Resampling Strategy**: By improving the Denoising Diffusion Probabilistic Model (DDPM), a resampling strategy is proposed to better guide the noise removal during the generation process, thus generating higher - quality images. 2. **Combining Explicit Landmark Information**: During the generation process, the explicit landmark information (such as object positions) provided by users is used to guide the generation process, ensuring that the generated images are more semantically consistent. 3. **Optimizing Inference Time**: By adjusting the resampling time and steps, the inference time is reduced while maintaining the image quality. ### Formula Representation - Encoding known input information: \[ z_{\text{known}}=\text{Encode}(x_{\text{known}}) \] - Decoding the combined latent variables: \[ y = \text{Decode}(z_{\text{known}}+z_{\text{unknown}}) \] - Adding noise in the forward process: \[ q(x_t|x_{t - 1})=\mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t - 1},\beta_tI) \] - Accumulating noise forward: \[ x_t=\sqrt{\bar{\alpha}_t}x_0+\sqrt{1-\bar{\alpha}_t}\epsilon \] - Denoising in the reverse process: \[ p_\theta(x_{t - 1}|x_t)=\mathcal{N}(x_{t - 1};\mu_\theta(x_t,t),\Sigma_\theta(x_t,t)) \] ### Summary The main contribution of this paper is to propose a DDPM method that combines explicit landmark information and resampling strategy. It can significantly reduce the inference time while maintaining high - quality image generation and generate semantically consistent images. This method provides new ideas and technical support for future controllable image generation.