Abstract:In arbitrary shape text detection, locating accurate text boundaries is challenging and non-trivial. Existing methods often suffer from indirect text boundary modeling or complex post-processing. In this paper, we systematically present a unified coarse-to-fine framework via boundary learning for arbitrary shape text detection, which can accurately and efficiently locate text boundaries without post-processing. In our method, we explicitly model the text boundary via an innovative iterative boundary transformer in a coarse-to-fine manner. In this way, our method can directly gain accurate text boundaries and abandon complex post-processing to improve efficiency. Specifically, our method mainly consists of a feature extraction backbone, a boundary proposal module, and an iteratively optimized boundary transformer module. The boundary proposal module consisting of multi-layer dilated convolutions will predict important prior information (including classification map, distance field, and direction field) for generating coarse boundary proposals while guiding the boundary transformer's optimization. The boundary transformer module adopts an encoder-decoder structure, in which the encoder is constructed by multi-layer transformer blocks with residual connection while the decoder is a simple multi-layer perceptron network (MLP). Under the guidance of prior information, the boundary transformer module will gradually refine the coarse boundary proposals via iterative boundary deformation. Furthermore, we propose a novel boundary energy loss (BEL) that introduces an energy minimization constraint and an energy monotonically decreasing constraint to further optimize and stabilize the learning of boundary refinement. Extensive experiments on publicly available and challenging datasets demonstrate the state-of-the-art performance and promising efficiency of our method. The code and model are available at:https://github.com/GXYM/TextBPN-Puls-Plus.

Transformer-Convolution Network for Arbitrary Shape Text Detection

CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer

A Transformer-Based Object Detector with Coarse-Fine Crossing Representations

Arbitrary Shape Text Detection via Boundary Transformer

Arbitrarily Shaped Scene Text Detection with Dynamic Convolution

ContourNet: Taking a Further Step Toward Accurate Arbitrary-shaped Scene Text Detection.

Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

Tk-Text: Multi-Shaped Scene Text Detection Via Instance Segmentation

Arbitrary-Shaped Text Detection withAdaptive Text Region Representation

Transforming Scene Text Detection and Recognition: A Multi-Scale End-to-End Approach With Transformer Framework

Shape Robust Text Detection with Progressive Scale Expansion Network

CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes

DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer

Deformable Kernel Expansion Model for Efficient Arbitrary-shaped Scene Text Detection

Arbitrary-shape Scene Text Detection via Visual-Relational Rectification and Contour Approximation

Learning Pixel Affinity Pyramid for Arbitrary-Shaped Text Detection

Aggregated Text Transformer for Scene Text Detection

I3CL: Intra- and Inter-Instance Collaborative Learning for Arbitrary-Shaped Scene Text Detection

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

Geometry-Aware Scene Text Detection with Instance Transformation Network.