Abstract:The scene text removal (STR) task aims to remove text regions and recover the background smoothly in images for private information protection. Most existing STR methods adopt encoder-decoder-based CNNs, with direct copies of the features in the skip connections. However, the encoded features contain both text texture and structure information. The insufficient utilization of text features hampers the performance of background reconstruction in text removal regions. To tackle these problems, we propose a novel Feature Erasing and Transferring (FET) mechanism to reconfigure the encoded features for STR in this paper. In FET, a Feature Erasing Module (FEM) is designed to erase text features. An attention module is responsible for generating the feature similarity guidance. The Feature Transferring Module (FTM) is introduced to transfer the corresponding features in different layers based on the attention guidance. With this mechanism, a one-stage, end-to-end trainable network called FETNet is constructed for scene text removal. In addition, to facilitate research on both scene text removal and segmentation tasks, we introduce a novel dataset, Flickr-ST, with multi-category annotations. A sufficient number of experiments and ablation studies are conducted on the public datasets and Flickr-ST. Our proposed method achieves state-of-the-art performance using most metrics, with remarkably higher quality scene text removal results. The source code of our work is available at: \href{<a class="link-external link-https" href="https://github.com/GuangtaoLyu/FETNet" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/GuangtaoLyu/FETNet" rel="external noopener nofollow">this https URL</a>.

EraseNet: End-to-End Text Removal in the Wild

MTRNet: A Generic Scene Text Eraser

Self-Supervised Text Erasing with Controllable Image Synthesis

Scene text removal via cascaded text stroke detection and erasing

Stroke-Based Scene Text Erasing Using Synthetic Data for Training

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser

What is the Real Need for Scene Text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties

A Simple and Strong Baseline: Progressively Region-based Scene Text Removal Networks

Scene Text Eraser

FETNet: Feature Erasing and Transferring Network for Scene Text Removal

PERT: A Progressively Region-based Network for Scene Text Removal

Progressive Scene Text Erasing with Self-Supervision.

Modeling Stroke Mask for End-to-End Text Erasing

PSSTRNet: Progressive Segmentation-guided Scene Text Removal Network

MSLKANet: A Multi-Scale Large Kernel Attention Network for Scene Text Removal

MagicEraser: Erasing Any Objects via Semantics-Aware Control

Editing Text in the Wild

RealEra: Semantic-level Concept Erasure via Neighbor-Concept Mining

Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models