Text Image Inpainting via Global Structure-Guided Diffusion Models

Shipeng Zhu,Pengfei Fang,Chenjie Zhu,Zuoyan Zhao,Qiang Xu,Hui Xue

2024-08-01

Abstract:Real-world text can be damaged by corrosion issues caused by environmental or human factors, which hinder the preservation of the complete styles of texts, e.g., texture and structure. These corrosion issues, such as graffiti signs and incomplete signatures, bring difficulties in understanding the texts, thereby posing significant challenges to downstream applications, e.g., scene text recognition and signature identification. Notably, current inpainting techniques often fail to adequately address this problem and have difficulties restoring accurate text images along with reasonable and consistent styles. Formulating this as an open problem of text image inpainting, this paper aims to build a benchmark to facilitate its study. In doing so, we establish two specific text inpainting datasets which contain scene text images and handwritten text images, respectively. Each of them includes images revamped by real-life and synthetic datasets, featuring pairs of original images, corrupted images, and other assistant information. On top of the datasets, we further develop a novel neural framework, Global Structure-guided Diffusion Model (GSDM), as a potential solution. Leveraging the global structure of the text as a prior, the proposed GSDM develops an efficient diffusion model to recover clean texts. The efficacy of our approach is demonstrated by thorough empirical study, including a substantial boost in both recognition accuracy and image quality. These findings not only highlight the effectiveness of our method but also underscore its potential to enhance the broader field of text image understanding and processing. Code and datasets are available at: <a class="link-external link-https" href="https://github.com/blackprotoss/GSDM" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve The paper aims to address the issue of damage to text images caused by corrosion. Specifically, text in the real world may be corroded due to environmental or human factors, which can compromise the integrity of the text, such as its texture and structure, thereby affecting the understanding of the text. These issues present significant challenges in downstream applications such as scene text recognition and signature verification. Current image restoration techniques often struggle to adequately address this problem, failing to accurately restore text images and their reasonable, consistent styles. To tackle this challenge, the paper proposes the following objectives: 1. **Establish a Benchmark**: Create a benchmark to promote research in text image restoration. To this end, the paper creates two specific text restoration datasets, containing scene text images and handwritten text images, respectively. 2. **Develop a New Model**: Propose a new neural network framework—the Global Structure-guided Diffusion Model (GSDM)—to address the problem of text image restoration. This model leverages the global structure of the text as prior information to restore clear text images through an efficient diffusion model. 3. **Evaluate the Method's Effectiveness**: Demonstrate the effectiveness of the proposed method through extensive empirical studies, including significant improvements in recognition accuracy and image quality. In summary, the paper aims to improve the quality and accuracy of text image restoration by establishing benchmarks and developing new models, thereby enhancing the overall level of understanding and processing of text images.

Text Image Inpainting via Global Structure-Guided Diffusion Models

MIGT: Multi-modal Image Inpainting Guided with Text.

Towards Interactive Facial Image Inpainting by Text or Exemplar Image.

UCTGAN: Diverse Image Inpainting Based on Unsupervised Cross-Space Translation

Improving Text-guided Object Inpainting with Semantic Pre-inpainting

CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

Text Guided Image Using Machine Learning

Text-Guided Neural Image Inpainting

SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

MMGInpainting: Multi-Modality Guided Image Inpainting Based On Diffusion Models

Exploring the Capability of Text-to-Image Diffusion Models With Structural Edge Guidance for Multispectral Satellite Image Inpainting

Exploring the Capability of Text-to-Image Diffusion Models with Structural Edge Guidance for Multi-Spectral Satellite Image Inpainting

DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models

Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model

Line Drawing Guided Progressive Inpainting of Mural Damage

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Delving Globally into Texture and Structure for Image Inpainting

Generative Image Inpainting with Segmentation Confusion Adversarial Training and Contrastive Learning

Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

EA-GAN: restoration of text in ancient Chinese books based on an example attention generative adversarial network

TSINIT: A Two-Stage Inpainting Network for Incomplete Text