Self-Supervised Text Erasing with Controllable Image Synthesis

Gangwei Jiang,Shiyao Wang,Tiezheng Ge,Yuning Jiang,Ying Wei,Defu Lian
DOI: https://doi.org/10.1145/3503161.3547905
2022-01-01
Abstract:Recent efforts on text erasing have shown promising results. However, existing methods require rich yet costly label annotations to obtain robust models, which limits their use for practical applications. To this end, we study an unsupervised scenario by proposing a novel Self-supervised Text Erasing (STE) framework that jointly learns to synthesize training images with erasure ground-truth and accurately erase texts in the real world. We first design a styleaware image synthesis function to generate synthetic images with diverse styled texts based on two synthetic mechanisms. To bridge the text style gap between the synthetic and real-world data, a policy network is constructed to control the synthetic mechanisms by picking style parameters with the guidance of two specifically designed rewards. The synthetic training images with ground-truth are then fed to train a coarse-to-fine erasing network. To produce better erasing outputs, a triplet erasure loss is designed to enforce the refinement stage to recover background textures. Moreover, we provide a new dataset (called PosterErase), which contains 60K high-resolution posters and is more challenging for the erasing task. The proposed method has been extensively evaluated with both PosterErase and the widely-used SCUT-Enstext dataset. Notably, on PosterErase, our method achieves 5.07 in terms of FID, with a relative improvement of 20.9% over existing supervised baselines.
What problem does this paper attempt to address?