Abstract:Scene text detection poses a considerable challenge due to the diverse nature of text appearance, backgrounds, and orientations. Enhancing robustness, accuracy, and efficiency in this context is vital for several applications, such as optical character recognition, image understanding, and autonomous vehicles. This paper explores the integration of generative adversarial network (GAN) and network variational autoencoder (VAE) to create a robust and potent text detection network. The proposed architecture comprises three interconnected modules: the VAE module, the GAN module, and the text detection module. In this framework, the VAE module plays a pivotal role in generating diverse and variable text regions. Subsequently, the GAN module refines and enhances these regions, ensuring heightened realism and accuracy. Then, the text detection module takes charge of identifying text regions in the input image via assigning confidence scores to each region. The comprehensive training of the entire network involves minimizing a joint loss function that encompasses the VAE loss, the GAN loss, and the text detection loss. The VAE loss ensures diversity in generated text regions and the GAN loss guarantees realism and accuracy, while the text detection loss ensures high-precision identification of text regions. The proposed method employs an encoder-decoder structure within the VAE module and a generator-discriminator structure in the GAN module. Rigorous testing on diverse datasets including Total-Text, CTW1500, ICDAR 2015, ICDAR 2017, ReCTS, TD500, COCO-Text, SynthText, Street View Text, and KIAST Scene Text demonstrates the superior performance of the proposed method compared to existing approaches.

Soft set-based MSER end-to-end system for occluded scene text detection, recognition and prediction

Scene Text Detection and Recognition System for Visually Impaired People in Real World

A new method for detection and prediction of occluded text in natural scene images

A New Context-Based Method for Restoring Occluded Text in Natural Scene Images

Robust Text Detection in Natural Scene Images

Scene Text Detection via Holistic, Multi-Channel Prediction

OPMP: An Omnidirectional Pyramid Mask Proposal Network for Arbitrary-Shape Scene Text Detection

A Text Detection System for Natural Scenes with Convolutional Feature Learning and Cascaded Classification

Deformable scene text detection using harmonic features and modified pixel aggregation network

Arbitrarily-Oriented Text Detection in Low Light Natural Scene Images

A Flattened Maximally Stable Extremal Region Method for Scene Text Detection.

MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text Recognition

Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation

DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting

Shape Robust Text Detection with Progressive Scale Expansion Network

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

A Robust Symmetry-Based Method For Scene/Video Text Detection Through Neural Network

Text Proposals Based on Windowed Maximally Stable Extremal Region for Scene Text Detection.

Multi-Script-Oriented Text Detection and Recognition in Video/Scene/Born Digital Images

Multi-Spectral Fusion Based Approach for Arbitrarily Oriented Scene Text Detection in Video Images

Scene text detection using structured information and an end-to-end trainable generative adversarial networks