Abstract:Existing methods for arbitrary shaped text spotting can be divided into two categories: bottom-up methods detect and recognize local areas of text, and then group them into text lines or words; top-down methods detect text regions of interest, then apply polygon fitting and text recognition to the detected regions. In this paper, we analyze the advantages and disadvantages of these two methods, and propose a novel text spotter by fusing bottom-up and top-down processing. To detect text of arbitrary shapes, we employ a bottom-up detector to describe text with a series of rotated squares, and design a top-down detector to represent the region of interest with a minimum enclosing rotated rectangle. Then the text boundary is determined by fusing the outputs of two detectors. To connect arbitrary shaped text detection and recognition, we propose a differentiable operator named RoISlide, which can extract features for arbitrary text regions from whole image feature maps. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from character-level annotations. To improve the robustness against scale variance, we further propose a residual dual scale spotting mechanism, where two spotters work on different feature levels, and the high-level spotter is based on residuals of the low-level spotter. Our method has achieved state-of-the-art performance on four English datasets and one Chinese dataset, including both arbitrary shaped and oriented texts. We also provide abundant ablation experiments to analyze how the key components affect the performance.

Scale-Residual Learning Network for Scene Text Detection

DSRN: A Deep Scale Relationship Network for Scene Text Detection.

Exploring Style-Robust Scene Text Detection via Style-Aware Learning

Learning and Fusing Multi-Scale Representations for Accurate Arbitrary-Shaped Scene Text Recognition.

A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion

R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection

Shape Robust Text Detection with Progressive Scale Expansion Network

A Unified Deep Neural Network For Scene Text Detection

Deep Neural Network with Attention Model for Scene Text Recognition.

Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint

Scale based region growing for scene text detection.

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation Network

Synthetically Supervised Feature Learning For Scene Text Recognition

Real-Time Scene Text Detection With Differentiable Binarization and Adaptive Scale Fusion

Adaptive Segmentation Network for Scene Text Detection

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection.

MSLKANet: A Multi-Scale Large Kernel Attention Network for Scene Text Removal

Natural Scene Text Detection Based on Multiscale Connectionist Text Proposal Network

Robust Seed Localization And Growing With Deep Convolutional Features For Scene Text Detection

Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection