Abstract:Existing methods for arbitrary shaped text spotting can be divided into two categories: bottom-up methods detect and recognize local areas of text, and then group them into text lines or words; top-down methods detect text regions of interest, then apply polygon fitting and text recognition to the detected regions. In this paper, we analyze the advantages and disadvantages of these two methods, and propose a novel text spotter by fusing bottom-up and top-down processing. To detect text of arbitrary shapes, we employ a bottom-up detector to describe text with a series of rotated squares, and design a top-down detector to represent the region of interest with a minimum enclosing rotated rectangle. Then the text boundary is determined by fusing the outputs of two detectors. To connect arbitrary shaped text detection and recognition, we propose a differentiable operator named RoISlide, which can extract features for arbitrary text regions from whole image feature maps. Based on the extracted features through RoISlide, a CNN and CTC based text recognizer is introduced to make the framework free from character-level annotations. To improve the robustness against scale variance, we further propose a residual dual scale spotting mechanism, where two spotters work on different feature levels, and the high-level spotter is based on residuals of the low-level spotter. Our method has achieved state-of-the-art performance on four English datasets and one Chinese dataset, including both arbitrary shaped and oriented texts. We also provide abundant ablation experiments to analyze how the key components affect the performance.

Text Detection Through Multiple-Scale Localization in Video Sequences

A new video text detection method.

A Novel Approach to Text Detection and Extraction from Videos by Discriminative Features and Density

Adaptive video text-size detection algorithm

Video text detection and segmentation for optical character recognition

A method for text line detection in natural images

Coarse-to-fine Video Text Detection

Fast and robust text detection in images and video frames

Scale based region growing for scene text detection.

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Video text detection and localization based on localized generalization error model

Fast and Effective Text Detection.

A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video

A Robust Text Detection Algorithm in Images and Video Frames

A Robust Color-Independent Text Detection Method from Complex Videos

Graphics and Scene Text Classification in Video

Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation

AUTOMATIC DETECTION AND VERIFICATION OF TEXT REGIONS IN NEWS VIDEO FRAMES

A Research on Video Text Tracking and Recognition

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing