Modeling Stroke Mask for End-to-End Text Erasing

Xiangcheng Du,Zhao Zhou,Yingbin Zheng,Tianlong Ma,Xingjiao Wu,Cheng Jin
DOI: https://doi.org/10.1109/wacv56688.2023.00609
2023-01-01
Abstract:Scene text erasing aims to wipe text regions in scene images with reasonable background. Most previous approaches employ scene text detectors to assist localization of the text regions. However, detected text boxes contain both text strokes and background clutters, and directly in-painting on the whole boxes may remain text artifacts and make regions unnatural. In this paper, we present an end-to-end network that focuses on modeling text stroke masks that provide more accurate locations to compute erased images. The network consists of two stages, i.e., a basic network with stroke generation and a refinement network with stroke awareness. The basic network predicts the text stroke masks and initial erasing results simultaneously. The refinement network receives the masks as supervision to generate natural erased results. Experiments on both synthetic and real-world scene images demonstrate the effectiveness of our framework in producing high quality erasing results.
What problem does this paper attempt to address?