Abstract:The degree of difficulty in image inpainting depends on the types and sizes of the missing parts. Existing image inpainting approaches usually encounter difficulties in completing the missing parts in the wild with pleasing visual and contextual results as they are trained for either dealing with one specific type of missing patterns (mask) or unilaterally assuming the shapes and/or sizes of the masked areas. We propose a deep generative inpainting network, named DeepGIN, to handle various types of masked images. We design a Spatial Pyramid Dilation (SPD) ResNet block to enable the use of distant features for reconstruction. We also employ Multi-Scale Self-Attention (MSSA) mechanism and Back Projection (BP) technique to enhance our inpainting results. Our DeepGIN outperforms the state-of-the-art approaches generally, including two publicly available datasets (FFHQ and Oxford Buildings), both quantitatively and qualitatively. We also demonstrate that our model is capable of completing masked images in the wild.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the difficulties encountered by existing methods in handling missing regions of various shapes and sizes in the image inpainting task. Specifically, existing image inpainting methods can usually only handle specific types of missing patterns (such as rectangular masks), or unilaterally assume the shape and/or size of the masked area, which makes it difficult for them to produce satisfactory visual and contextual results in practical applications. To overcome these limitations, the paper proposes a new Deep Generative Inpainting Network (DeepGIN), which aims to handle various types of masked images and can repair the missing parts in the image while maintaining visual and semantic coherence. ### Main Contributions 1. **Spatial Pyramid Dilation (SPD) Block**: Designed the Spatial Pyramid Dilation block to handle masks of different shapes and sizes. By using different receptive fields, information from surrounding and distant spatial locations can contribute to the prediction of local missing regions. 2. **Multi - Scale Self - Attention (MSSA) Mechanism**: Emphasizes the self - similarity of the image itself and enhances the coherence of the inpainting results through the multi - scale self - attention mechanism. 3. **Back Projection (BP) Technique**: Designed the back - projection strategy to obtain better alignment between the generated pattern and the reference real image, thereby improving the quality of the inpainted image. ### Method Overview DeepGIN consists of two stages: - **Coarse Reconstruction Stage**: Use the coarse generator \(G_1\) to make a preliminary estimate of the missing pixels and generate a coarse inpainted image \(I_{\text{coarse}}\). - **Refinement Stage**: Use the refinement generator \(G_2\) to detail - decorate the coarse inpainted image and finally form the completed image \(I_{\text{out}}\). ### Network Architecture - **Coarse Reconstruction Stage**: Adopts an encoder - decoder structure and uses SPD ResNet blocks with different dilation rates to expand the receptive field. - **Refinement Stage**: Also adopts an encoder - decoder structure, but adds MSSA blocks and BP techniques to improve the coherence and alignment of the inpainting results. ### Loss Function The loss function includes five main terms: - **L1 Loss**: Ensures pixel - level reconstruction accuracy. - **Adversarial Loss**: Promotes the distribution of the generated image to be close to that of the real image. - **Perceptual Loss**: Encourages the generated image and the reference real image to be similar in feature representation. - **Style Loss**: Emphasizes the style similarity between the generated image and the real image. - **Total Variation Loss**: Serves as a regularization term to ensure the smoothness of the generated image. ### Experimental Results The paper conducted experiments on multiple datasets, including the FFHQ and Oxford Buildings datasets. The experimental results show that DeepGIN outperforms the existing state - of - the - art methods in both quantitative and qualitative indicators, especially when dealing with images with complex and irregular masks. ### Conclusion DeepGIN significantly improves the quality and robustness of image inpainting by introducing SPD, MSSA, and BP techniques, and can produce satisfactory visual and contextual results when dealing with masks of various types and sizes.

DeepGIN: Deep Generative Inpainting Network for Extreme Image Inpainting

Image Inpainting Based on Interactive Separation Network and Progressive Reconstruction Algorithm

Face Image Inpainting Based on Generative Adversarial Network

A Progressive Image Inpainting Algorithm with a Mask Auto-update Branch

Deep Inception Generative Network for Cognitive Image Inpainting

Progressive Inpainting Strategy with Partial Convolutions Generative Networks (PPCGN).

Free-Form Image Inpainting with Separable Gate Encoder-Decoder Network

Pyramid-VAE-GAN: Transferring Hierarchical Latent Variables for Image Inpainting

Image Inpainting by End-to-End Cascaded Refinement With Mask Awareness

High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

DGCA: high resolution image inpainting via DR-GAN and contextual attention

A Progressive and Multi-Prior-Guided Network for Image Inpainting

Inpainting with Separable Mask Update Convolution Network

Semantic Residual Pyramid Network for Image Inpainting

DE-GAN: Domain Embedded GAN for High Quality Face Image Inpainting

Dual-Pyramidal Image Inpainting with Dynamic Normalization

Texture Memory-Augmented Deep Patch-Based Image Inpainting

Deep Multi-Resolution Mutual Learning for Image Inpainting

Distillation-guided Image Inpainting

Region-wise Generative Adversarial ImageInpainting for Large Missing Areas

Semantic Image Inpainting with Multi-Stage Feature Reasoning Generative Adversarial Network