ITrans: generative image inpainting with transformers

Wei Miao,Lijun Wang,Huchuan Lu,Kaining Huang,Xinchu Shi,Bocong Liu

DOI: https://doi.org/10.1007/s00530-023-01211-w

IF: 3.9

2024-01-19

Multimedia Systems

Abstract:Despite significant improvements, convolutional neural network (CNN) based methods are struggling with handling long-range global image dependencies due to their limited receptive fields, leading to an unsatisfactory inpainting performance under complicated scenarios. To address this issue, we propose the Inpainting Transformer (ITrans) network, which combines the power of both self-attention and convolution operations. The ITrans network augments convolutional encoder–decoder structure with two novel designs, i.e. , the global and local transformers. The global transformer aggregates high-level image context from the encoder in a global perspective, and propagates the encoded global representation to the decoder in a multi-scale manner. Meanwhile, the local transformer is intended to extract low-level image details inside the local neighborhood at a reduced computational overhead. By incorporating the above two transformers, ITrans is capable of both global relationship modeling and local details encoding, which is essential for hallucinating perceptually realistic images. Extensive experiments demonstrate that the proposed ITrans network outperforms favorably against state-of-the-art inpainting methods both quantitatively and qualitatively.

computer science, information systems, theory & methods

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in the image inpainting task, traditional Convolutional Neural Network (CNN) methods, due to their limited receptive fields, are difficult to handle long - distance global image dependencies, resulting in poor inpainting performance in complex scenes. Specifically, the paper points out: - **Problem Background**: Image inpainting (or image completion) refers to the task of filling in missing pixels in an image to generate a complete image. This task has applications in various image editing fields, such as object removal, image restoration, photo retouching, etc. Before the deep - learning era, such tasks were mainly carried out by using existing image patches to fill in occluded areas. However, these methods lack semantic understanding and have thus been replaced by methods based on deep neural networks. - **Limitations of Existing Methods**: Although CNN - based methods perform well in generating details, their limited receptive fields are not sufficient to obtain the information required for high - quality inpainting, especially in complex scenes, which leads to unwanted artifacts and blurry results. - **New Challenges**: Recently, Transformer models have demonstrated record - breaking performance in various computer vision tasks, especially in modeling long - distance dependencies. However, Transformers lack inductive bias, which poses challenges when they are processing images. Although Transformers have a higher performance ceiling than CNNs, they are more difficult to learn due to complex pre - training requirements. To solve the above problems, the authors propose the Inpainting Transformer (ITrans) network, aiming to combine the advantages of CNNs and Transformers to improve the quality of image inpainting. Specifically, the ITrans network enhances the ability to model global relationships and local details by introducing global Transformer and local Transformer modules, thereby being able to generate perceptually more realistic images.

ITrans: generative image inpainting with transformers

UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation

HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention

Delving Globally into Texture and Structure for Image Inpainting

CTNet: hybrid architecture based on CNN and transformer for image inpainting detection

A transformer–CNN for deep image inpainting forensics

Bridging partial-gated convolution with transformer for smooth-variation image inpainting

Transformer-Based Image Inpainting Detection via Label Decoupling and Constrained Adversarial Training

The Improved Image Inpainting Algorithm Via Encoder and Similarity Constraint

PIPformers: Patch based inpainting with vision transformers for generalize paintings

Image Inpainting Based on Interactive Separation Network and Progressive Reconstruction Algorithm

TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding

Inpainting Transformer for Anomaly Detection

Transformer-based Image and Video Inpainting: Current Challenges and Future Directions

SyFormer: Structure-Guided Synergism Transformer for Large-Portion Image Inpainting

Sparse self-attention transformer for image inpainting

Image Inpainting Technique Incorporating Edge Prior and Attention Mechanism

Interactive Separation Network for Image Inpainting

Decoupled Spatial-Temporal Transformer for Video Inpainting