Abstract:Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.

A New Language-Independent Deep CNN for Scene Text Detection and Style Transfer in Social Media Images

UATST: Towards Unpaired Arbitrary Text-Guided Style Transfer with Cross-Space Modulation

GLStyleNet: Exquisite Style Transfer Combining Global and Local Pyramid Features

Scene Text Detection and Recognition System for Visually Impaired People in Real World

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection.

Exploring Style-Robust Scene Text Detection via Style-Aware Learning

Multitask Attentive Network for Text Effects Quality Assessment.

Deep Image Style Transfer from Freeform Text

Text detection and script identification in natural scene images using deep learning

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

DPNet: Scene text detection based on dual perspective CNN-transformer

Scene text detection using structured information and an end-to-end trainable generative adversarial networks

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

A Unified Deep Neural Network For Scene Text Detection

Improved Object-Based Style Transfer with Single Deep Network

Cycle-Consistent Adversarial Autoencoders for Unsupervised Text Style Transfer

Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer

Text-Attentional Convolutional Neural Network for Scene Text Detection

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer

Text-Attentional Convolutional Neural Networks for Scene Text Detection