Abstract:Benefiting from the popularity of deep learning theory, scene text detection algorithms have developed rapidly in recent years. Methods representing text region by text segmentation map are proved to capture arbitrary-shaped text in a more flexible and accurate way. However, such segmentation-based methods are prone to be disturbed by the text-like background patterns (like the fence, grass, etc.), which generally suffer from imprecise boundary detail problem. In this paper, LEMNet is proposed to handle the imprecise boundary problem by guiding the generation of text boundary based on a priori constraint. In the training stage, Boundary Segmentation Branch is firstly constructed to predict coarse boundary mask for each text instance. Then, through mapping pixels into an embedding space, the proposed Pixel Embedding Branch makes the embedding representation of boundary points learn to be more similar, meanwhile enlarging the characteristic distance between background points and boundary points. During inference, noise in the coarse boundary segmentation map can be effectively suppressed by a Noisy Point Suppression Algorithm among pixel embedding vectors. In this way, LEMNet can generate a more precise boundary description of text regions. To further enhance the distinguishability of boundary features, we propose a Context Enhancement Module to capture feature interactions in different representation subspaces, in which features are parallelly performed attention and concatenated to generate enhanced features. Extensive experiments are conducted over four challenging datasets, which demonstrate the effectiveness of LEMNet. Specifically, LEMNet achieves F-measure of 85.2%, 87.6% and 85.2% on CTW1500, Total-Text and MSRA-TD500 respectively, which is the latest SOTA.

Text Enhancement Network for Cross-Domain Scene Text Detection

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Feature Enhancement Network: A Refined Scene Text Detector.

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Fenet: Feature Enhancement Network for Arbitrary Direction Text Detection

A Multi-scale Domain Adaptive Framework for Scene Text Detection

Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection.

Self-Training for Domain Adaptive Scene Text Detection

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

Text-Attentional Convolutional Neural Network for Scene Text Detection

Scene Text Detection Based on Dual-branch Multi-resolution Feature-aware Enhancement Network

Robust Scene Text Recognition Through Adaptive Image Enhancement

Domain Adaptive Scene Text Detection via Subcategorization

A Cost-Efficient Framework for Scene Text Detection in the Wild

FDTA: Fully Convolutional Scene Text Detection with Text Attention.

Text-Attentional Convolutional Neural Networks for Scene Text Detection

A Robust and Effective Text Detector Supervised by Contrastive Learning

Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection.

Boundary-aware Arbitrary-shaped Scene Text Detector with Learnable Embedding Network

Enhancing Scene Text Recognition by Strengthening Attention Alignment