Abstract:Have you ever thought that you can be an intelligent painter? This means that you can paint a picture with a few expected objects in mind, or with a desirable scene. This is different from normal inpainting approaches for which the location of specific objects cannot be determined. In this paper, we present an intelligent painter that generate a person's imaginary scene in one go, given explicit hints. We propose a resampling strategy for Denoising Diffusion Probabilistic Model (DDPM) to intelligently compose unconditional harmonized pictures according to the input subjects at specific locations. By exploiting the diffusion property, we resample efficiently to produce realistic pictures. Experimental results show that our resampling method favors the semantic meaning of the generated output efficiently and generates less blurry output. Quantitative analysis of image quality assessment shows that our method produces higher perceptual quality images compared with the state-of-the-art methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **controllable image synthesis**, especially how to generate high - quality, semantically consistent images based on key elements (such as specific objects or scenes) provided by users. Specifically, the author proposes a new method that enables users to generate an ideal scene graph by inputting some explicit prompts (for example, the positions of certain objects). ### Main Problem Description 1. **Limitations of Traditional Image Inpainting and Generation Methods**: - Traditional methods such as GAN - based models (iGAN, GANBrush, PoE - GAN, etc.) rely on image priors, which limit the types of user input. - Early image inpainting methods (such as those based on large - database matching or neighboring - pixel interpolation) can only handle small - scale missing regions and have limited effectiveness. - Although deep - learning methods can handle large - scale missing regions, they still face challenges in maintaining semantic consistency and generating high - quality images. 2. **Gap between User Requirements and Existing Technologies**: - Users hope to be able to generate a complete, high - quality image just by providing some key elements or scene concepts, like an "intelligent painter". - Existing methods find it difficult to efficiently generate images that match user intentions while maintaining image quality. ### Solutions To solve the above problems, the author proposes the following innovations: 1. **Introducing Resampling Strategy**: By improving the Denoising Diffusion Probabilistic Model (DDPM), a resampling strategy is proposed to better guide the noise removal during the generation process, thus generating higher - quality images. 2. **Combining Explicit Landmark Information**: During the generation process, the explicit landmark information (such as object positions) provided by users is used to guide the generation process, ensuring that the generated images are more semantically consistent. 3. **Optimizing Inference Time**: By adjusting the resampling time and steps, the inference time is reduced while maintaining the image quality. ### Formula Representation - Encoding known input information: \[ z_{\text{known}}=\text{Encode}(x_{\text{known}}) \] - Decoding the combined latent variables: \[ y = \text{Decode}(z_{\text{known}}+z_{\text{unknown}}) \] - Adding noise in the forward process: \[ q(x_t|x_{t - 1})=\mathcal{N}(x_t;\sqrt{1-\beta_t}x_{t - 1},\beta_tI) \] - Accumulating noise forward: \[ x_t=\sqrt{\bar{\alpha}_t}x_0+\sqrt{1-\bar{\alpha}_t}\epsilon \] - Denoising in the reverse process: \[ p_\theta(x_{t - 1}|x_t)=\mathcal{N}(x_{t - 1};\mu_\theta(x_t,t),\Sigma_\theta(x_t,t)) \] ### Summary The main contribution of this paper is to propose a DDPM method that combines explicit landmark information and resampling strategy. It can significantly reduce the inference time while maintaining high - quality image generation and generate semantically consistent images. This method provides new ideas and technical support for future controllable image generation.

Intelligent Painter: Picture Composition With Resampling Diffusion Model

Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching

Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model

A Hybrid Inpainting Model Combining Diffusion and Enhanced Exemplar Methods

Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap

Coherent and Multi-modality Image Inpainting via Latent Space Optimization

TD-Paint: Faster Diffusion Inpainting Through Time Aware Pixel Conditioning

Image Inpainting Based on Interactive Separation Network and Progressive Reconstruction Algorithm

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

SePaint: Semantic Map Inpainting via Multinomial Diffusion

A Diffusion Model with A FFT for Image Inpainting

Intelli-Paint: Towards Developing Human-like Painting Agents

PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control

DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models

Diffusion Model-Based Image Editing: A Survey

GradPaint: Gradient-guided inpainting with diffusion models

Painterly Image Harmonization using Diffusion Model