Abstract:Scene-text image synthesis techniques that aim to naturally compose text instances on background scene images are very appealing for training deep neural networks due to their ability to provide accurate and comprehensive annotation information. Prior studies have explored generating synthetic text images on two-dimensional and three-dimensional surfaces using rules derived from real-world observations. Some of these studies have proposed generating scene-text images through learning; however, owing to the absence of a suitable training dataset, unsupervised frameworks have been explored to learn from existing real-world data, which might not yield reliable performance. To ease this dilemma and facilitate research on learning-based scene text synthesis, we introduce DecompST, a real-world dataset prepared from some public benchmarks, containing three types of annotations: quadrilateral-level BBoxes, stroke-level text masks, and text-erased images. Leveraging the DecompST dataset, we propose a Learning-Based Text Synthesis engine (LBTS) that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet). TLPNet first predicts the suitable regions for text embedding, after which TAANet adaptively adjusts the geometry and color of the text instance to match the background context. After training, those networks can be integrated and utilized to generate the synthetic dataset for scene text analysis tasks. Comprehensive experiments were conducted to validate the effectiveness of the proposed LBTS along with existing methods, and the experimental results indicate the proposed LBTS can generate better pretraining data for scene text detectors.

Synthesizing Data for Text Recognition with Style Transfer

UATST: Towards Unpaired Arbitrary Text-Guided Style Transfer with Cross-Space Modulation

TeSTNeRF: Text-Driven 3D Style Transfer Via Cross-Modal Learning.

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

ITstyler: Image-optimized Text-based Style Transfer

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

Transductive Learning for Unsupervised Text Style Transfer

A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Traditional Chinese Synthetic Datasets Verified with Labeled Data for Scene Text Recognition

A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

Style Transfer in Text: Exploration and Evaluation

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition

TET-GAN: Text Effects Transfer via Stylization and Destylization

Semi-supervised Text Style Transfer: Cross Projection in Latent Space

Text Style Transfer Via Learning Style Instance Supported Latent Space

Chinese Text Detection Using Deep Learning Model And Synthetic Data

TE141K: Artistic Text Benchmark for Text Effect Transfer

ST$^2$: Small-data Text Style Transfer via Multi-task Meta-Learning

APRNet: Attention-based Pixel-wise Rendering Network for Photo-Realistic Text Image Generation

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models