DeepGIN: Deep Generative Inpainting Network for Extreme Image Inpainting

Chu-Tak Li,Wan-Chi Siu,Zhi-Song Liu,Li-Wen Wang,Daniel Pak-Kong Lun
DOI: https://doi.org/10.48550/arXiv.2008.07173
2020-08-17
Abstract:The degree of difficulty in image inpainting depends on the types and sizes of the missing parts. Existing image inpainting approaches usually encounter difficulties in completing the missing parts in the wild with pleasing visual and contextual results as they are trained for either dealing with one specific type of missing patterns (mask) or unilaterally assuming the shapes and/or sizes of the masked areas. We propose a deep generative inpainting network, named DeepGIN, to handle various types of masked images. We design a Spatial Pyramid Dilation (SPD) ResNet block to enable the use of distant features for reconstruction. We also employ Multi-Scale Self-Attention (MSSA) mechanism and Back Projection (BP) technique to enhance our inpainting results. Our DeepGIN outperforms the state-of-the-art approaches generally, including two publicly available datasets (FFHQ and Oxford Buildings), both quantitatively and qualitatively. We also demonstrate that our model is capable of completing masked images in the wild.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the difficulties encountered by existing methods in handling missing regions of various shapes and sizes in the image inpainting task. Specifically, existing image inpainting methods can usually only handle specific types of missing patterns (such as rectangular masks), or unilaterally assume the shape and/or size of the masked area, which makes it difficult for them to produce satisfactory visual and contextual results in practical applications. To overcome these limitations, the paper proposes a new Deep Generative Inpainting Network (DeepGIN), which aims to handle various types of masked images and can repair the missing parts in the image while maintaining visual and semantic coherence. ### Main Contributions 1. **Spatial Pyramid Dilation (SPD) Block**: Designed the Spatial Pyramid Dilation block to handle masks of different shapes and sizes. By using different receptive fields, information from surrounding and distant spatial locations can contribute to the prediction of local missing regions. 2. **Multi - Scale Self - Attention (MSSA) Mechanism**: Emphasizes the self - similarity of the image itself and enhances the coherence of the inpainting results through the multi - scale self - attention mechanism. 3. **Back Projection (BP) Technique**: Designed the back - projection strategy to obtain better alignment between the generated pattern and the reference real image, thereby improving the quality of the inpainted image. ### Method Overview DeepGIN consists of two stages: - **Coarse Reconstruction Stage**: Use the coarse generator \(G_1\) to make a preliminary estimate of the missing pixels and generate a coarse inpainted image \(I_{\text{coarse}}\). - **Refinement Stage**: Use the refinement generator \(G_2\) to detail - decorate the coarse inpainted image and finally form the completed image \(I_{\text{out}}\). ### Network Architecture - **Coarse Reconstruction Stage**: Adopts an encoder - decoder structure and uses SPD ResNet blocks with different dilation rates to expand the receptive field. - **Refinement Stage**: Also adopts an encoder - decoder structure, but adds MSSA blocks and BP techniques to improve the coherence and alignment of the inpainting results. ### Loss Function The loss function includes five main terms: - **L1 Loss**: Ensures pixel - level reconstruction accuracy. - **Adversarial Loss**: Promotes the distribution of the generated image to be close to that of the real image. - **Perceptual Loss**: Encourages the generated image and the reference real image to be similar in feature representation. - **Style Loss**: Emphasizes the style similarity between the generated image and the real image. - **Total Variation Loss**: Serves as a regularization term to ensure the smoothness of the generated image. ### Experimental Results The paper conducted experiments on multiple datasets, including the FFHQ and Oxford Buildings datasets. The experimental results show that DeepGIN outperforms the existing state - of - the - art methods in both quantitative and qualitative indicators, especially when dealing with images with complex and irregular masks. ### Conclusion DeepGIN significantly improves the quality and robustness of image inpainting by introducing SPD, MSSA, and BP techniques, and can produce satisfactory visual and contextual results when dealing with masks of various types and sizes.