Abstract:To localize text regions and separate close instances, the shrunk polygon is widely used in recent scene text detection methods. However, there exist two problems: 1) Existing methods fail to consider the aspect ratio sensitive problem when reconstructing the text instance from shrunk polygon. 2) Texts with extreme aspect ratios will lead to the fracture of shrunk polygons. To handle these two problems, in this paper, we propose a novel Adaptive Dilation Network (ADNet) to focus on the reconstruction process from shrunk polygon, which aims to provide a tight and complete text representation. Firstly, instead of using a fixed dilation factor, ADNet uses an aspect ratio-wise dilation factor to reconstruct the text region from shrunk polygon for each text instance. Such an instance-wise dilation factor considers the scale correlation between the original and shrunk polygon, and thus can guide an adaptive text region reconstruction for texts with large aspect ratio variance. Secondly, to deal with the fracture of detection results, a new Efficient Spatial Relationship Module (ESRM) is devised to capture long-range dependencies with low computation cost. ESRM uses a novel Weighted Pooling to reduce the resolution of feature maps without much information loss. Compared with the existing methods, ADNet further explores the potential of shrunk polygon-based approaches and obtains excellent detection results at an impressive speed. Extensive experiments on several datasets (Total-Text, CTW1500, MSRA-TD500 and ICDAR2015) verify the superiority of our method. Code will be available at https://github.com/qqqyd/ADNet.

Polygon-free: Unconstrained Scene Text Detection with Box Annotations

Progressive Evolution from Single-Point to Polygon for Scene Text

Texts as Points: Scene Text Detection with Point Supervision

BoxSnake: Polygonal Instance Segmentation with Box Supervision

A Scene Text Detector for Text with Arbitrary Shapes

Box2Poly: Memory-Efficient Polygon Prediction of Arbitrarily Shaped and Rotated Text

Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions

ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene Text Detection

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

Fully Data-Driven Pseudo Label Estimation for Pointly-Supervised Panoptic Segmentation

Exploring the Capacity of Sequential-free Box Discretization Network for Omnidirectional Scene Text Detection

Semi-Supervised Pixel-Level Scene Text Segmentation by Mutually Guided Network

Semi-Supervised Text Detection with Accurate Pseudo-Labels

CPN: Complementary Proposal Network for Unconstrained Text Detection

Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint

ContourNet: Taking a Further Step Toward Accurate Arbitrary-shaped Scene Text Detection.

OPMP: An Omnidirectional Pyramid Mask Proposal Network for Arbitrary-Shape Scene Text Detection

Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation

Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors

Annotating Object Instances with a Polygon-RNN

Arbitrary-shaped scene text detection by predicting distance map