Abstract:End-to-end text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity of the two complimentary tasks. It remains an open problem especially when processing arbitrarily-shaped text instances. Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2). Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance of arbitrary shapes, significantly improving the precision of recognition over previous methods. 3) Different from previous methods, which often suffer from complex post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a simple pipeline with the only post-processing non-maximum suppression (NMS). 4) As the performance of text recognition closely depends on feature alignment, ABCNet v2 further adopts a simple yet effective coordinate convolution to encode the position of the convolutional filters, which leads to a considerable improvement with negligible computation overhead. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the-art performance while maintaining very high efficiency. More importantly, as there is little work on quantization of text spotting models, we quantize our models to improve the inference time of the proposed ABCNet v2. This can be valuable for real-time applications. Code and model are availa-le at: https://git.io/AdelaiDet.

An End-to-End TextSpotter with Explicit Alignment and Attention.

Single Shot TextSpotter with Explicit Alignment and Attention

Towards End-to-End Text Spotting in Natural Scenes

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

ASTS: A Unified Framework for Arbitrary Shape Text Spotting.

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting

Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks

TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision

Bridging the Gap Between End-to-End and Two-Step Text Spotting

SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training

Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting

LATextSpotter: Empowering Transformer Decoder with Length Perception Ability

TDI TextSpotter: Taking Data Imbalance into Account in Scene Text Spotting.