Abstract:Recently, deep learning-based image inpainting methods have made great strides in reconstructing damaged regions. However, these methods often struggle to produce satisfactory results when dealing with missing images with large holes, leading to distortions in the structure and blurring of textures. To address these problems, we combine the advantages of transformers and convolutions to propose an image inpainting method that incorporates edge priors and attention mechanisms. The proposed method aims to improve the results of inpainting large holes in images by enhancing the accuracy of structure restoration and the ability to recover texture details. This method divides the inpainting task into two phases: edge prediction and image inpainting. Specifically, in the edge prediction phase, a transformer architecture is designed to combine axial attention with standard self-attention. This design enhances the extraction capability of global structural features and location awareness. It also balances the complexity of self-attention operations, resulting in accurate prediction of the edge structure in the defective region. In the image inpainting phase, a multi-scale fusion attention module is introduced. This module makes full use of multi-level distant features and enhances local pixel continuity, thereby significantly improving the quality of image inpainting. To evaluate the performance of our method, comparative experiments are conducted on several datasets, including CelebA, Places2, and Facade. Quantitative experiments show that our method outperforms the other mainstream methods. Specifically, it improves Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) by 1.141~3.234 db and 0.083~0.235, respectively. Moreover, it reduces Learning Perceptual Image Patch Similarity (LPIPS) and Mean Absolute Error (MAE) by 0.0347~0.1753 and 0.0104~0.0402, respectively. Qualitative experiments reveal that our method excels at reconstructing images with complete structural information and clear texture details. Furthermore, our model exhibits impressive performance in terms of the number of parameters, memory cost, and testing time.

CTNet: hybrid architecture based on CNN and transformer for image inpainting detection

CTH-Net: CNN-Transformer Hybrid Network for Garment Image Generation from Sketches and Color Points

UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation

A transformer–CNN for deep image inpainting forensics

Transformer-Based Image Inpainting Detection via Label Decoupling and Constrained Adversarial Training

Free-Form Image Inpainting with Separable Gate Encoder-Decoder Network

Image Inpainting Detection Based on Multi-task Deep Learning Network

Image Inpainting Based on Interactive Separation Network and Progressive Reconstruction Algorithm

ITrans: generative image inpainting with transformers

The Improved Image Inpainting Algorithm Via Encoder and Similarity Constraint

HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention

A Double Feature Fusion Network with Progressive Learning for Sharper Inpainting

Image Inpainting Detection Based on High-Pass Filter Attention Network

DNNAM: Image inpainting algorithm via deep neural networks and attention mechanism

Image Inpainting Technique Incorporating Edge Prior and Attention Mechanism

Noise Doesn't Lie: Towards Universal Detection of Deep Inpainting

Bridging partial-gated convolution with transformer for smooth-variation image inpainting

Parallel Multi-Resolution Fusion Network for Image Inpainting.

Delving Globally into Texture and Structure for Image Inpainting

Enhanced Wavelet Scattering Network for image inpainting detection