Abstract:In the last decade, the blossom of deep learning has witnessed the rapid development of scene text recognition. However, the recognition of low-resolution scene text images remains a challenge. Even though some super-resolution methods have been proposed to tackle this problem, they usually treat text images as general images while ignoring the fact that the visual quality of strokes (the atomic unit of text) plays an essential role for text recognition. According to Gestalt Psychology, humans are capable of composing parts of details into the most similar objects guided by prior knowledge. Likewise, when humans observe a low-resolution text image, they will inherently use partial stroke-level details to recover the appearance of holistic characters. Inspired by Gestalt Psychology, we put forward a Stroke-Aware Scene Text Image Super-Resolution method containing a Stroke-Focused Module (SFM) to concentrate on stroke-level internal structures of characters in text images. Specifically, we attempt to design rules for decomposing English characters and digits at stroke-level, then pre-train a text recognizer to provide stroke-level attention maps as positional clues with the purpose of controlling the consistency between the generated super-resolution image and high-resolution ground truth. The extensive experimental results validate that the proposed method can indeed generate more distinguishable images on TextZoom and manually constructed Chinese character dataset Degraded-IC13. Furthermore, since the proposed SFM is only used to provide stroke-level guidance when training, it will not bring any time overhead during the test phase. Code is available at https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt.

Modeling Stroke Mask for End-to-End Text Erasing

Scene text removal via cascaded text stroke detection and erasing

Stroke-Based Scene Text Erasing Using Synthetic Data for Training

Progressive Scene Text Erasing with Self-Supervision.

A Simple and Strong Baseline: Progressively Region-based Scene Text Removal Networks

PERT: A Progressively Region-based Network for Scene Text Removal

Scene Text Eraser

MTRNet: A Generic Scene Text Eraser

Self-Supervised Text Erasing with Controllable Image Synthesis

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

What is the Real Need for Scene Text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties

Real-Time Scene Text Detection Based on Stroke Model

Maskstr: Guide Scene Text Recognition Models with Masking

Exploring Stroke-Level Modifications for Scene Text Editing

Editing Text in the Wild

FETNet: Feature Erasing and Transferring Network for Scene Text Removal

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Mask-guided GAN for robust text editing in the scene

PSSTRNet: Progressive Segmentation-guided Scene Text Removal Network

Text-Guided Mask-free Local Image Retouching

TextDestroyer: A Training- and Annotation-Free Diffusion Method for Destroying Anomal Text from Images