Abstract:One trend in the latest bottom-up approaches for arbitrary-shape scene text detection is to determine the links between text segments using Graph Convolutional Networks (GCNs). However, the performance of these bottom-up methods is still inferior to that of state-of-the-art top-down methods even with the help of GCNs. We argue that a cause of this is that bottom-up methods fail to make proper use of visual-relational features, which results in accumulated false detection, as well as the error-prone route-finding used for grouping text segments. In this paper, we improve classic bottom-up text detection frameworks by fusing the visual-relational features of text with two effective false positive/negative suppression (FPNS) mechanisms and developing a new shape-approximation strategy. First, dense overlapping text segments depicting the ‘`characterness’' and ‘`streamline’' properties of text are constructed and used in weakly supervised node classification to filter the falsely detected text segments. Then, relational features and visual features of text segments are fused with a novel Location-Aware Transfer (LAT) module and Fuse Decoding (FD) module to jointly rectify the detected text segments. Finally, a novel multiple-text-map-aware contour-approximation strategy is developed based on the rectified text segments, instead of the error-prone route-finding process, to generate the final contour of the detected text. Experiments conducted on five benchmark datasets demonstrate that our method outperforms the state-of-the-art performance when embedded in a classic text detection framework, which revitalizes the strengths of bottom-up methods.

Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification.

Arbitrary-shaped Scene Text Detection with Keypoint-Based Shape Representation

Accurate Arbitrary-Shaped Scene Text Detection via Iterative Polynomial Parameter Regression.

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Symmetry-constrained Rectification Network for Scene Text Recognition

Robustly Recognizing Irregular Scene Text by Rectifying Principle Irregularities

Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting

A Direct Regression Scene Text Detector with Position-Sensitive Segmentation

Robust Scene Text Recognition with Automatic Rectification

Sliding Line Point Regression for Shape Robust Scene Text Detection.

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint

Towards End-to-End Text Spotting in Natural Scenes

ASTS: A Unified Framework for Arbitrary Shape Text Spotting.

Arbitrary-shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation

Rethinking Irregular Scene Text Recognition

Arbitrary-shaped scene text detection by predicting distance map

A Scene Text Detector for Text with Arbitrary Shapes

Sequential Deformation for Accurate Scene Text Detection

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

A holistic representation guided attention network for scene text recognition