Abstract:Reading text from images remains challenging due to multi-orientation, perspective distortion and especially the curved nature of irregular text. Most of existing approaches attempt to solve the problem in two or multiple stages, which is considered to be the bottleneck to optimize the overall performance. To address this issue, we propose an end-to-end trainable network architecture, named TextNet, which is able to simultaneously localize and recognize irregular text from images. Specifically, we develop a scale-aware attention mechanism to learn multi-scale image features as a backbone network, sharing fully convolutional features and computation for localization and recognition. In text detection branch, we directly generate text proposals in quadrangles, covering oriented, perspective and curved text regions. To preserve text features for recognition, we introduce a perspective RoI transform layer, which can align quadrangle proposals into small feature maps. Furthermore, in order to extract effective features for recognition, we propose to encode the aligned RoI features by RNN into context information, combining spatial attention mechanism to generate text sequences. This overall pipeline is capable of handling both regular and irregular cases. Finally, text localization and recognition tasks can be jointly trained in an end-to-end fashion with designed multi-task loss. Experiments on standard benchmarks show that the proposed TextNet can achieve state-of-the-art performance, and outperform existing approaches on irregular datasets by a large margin.

Scene Text Recognition With Finer Grid Rectification

ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification

Robustly Recognizing Irregular Scene Text by Rectifying Principle Irregularities

Rethinking Irregular Scene Text Recognition

Robust Scene Text Recognition with Automatic Rectification

Symmetry-constrained Rectification Network for Scene Text Recognition

A Multi-Object Rectified Attention Network for Scene Text Recognition

A Two-level Rectification Attention Network for Scene Text Recognition

Robust Scene Text Recognition Through Adaptive Image Enhancement

Research on Scene Text Recognition Algorithm Basedon Improved CRNN

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

A holistic representation guided attention network for scene text recognition

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network.

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

A irregular text detection via dilated recombination and efficient reorganization on natural scene

Reading Arbitrary-Shaped Scene Text from Images Through Spline Regression and Rectification.

2D Attentional Irregular Scene Text Recognizer

Scene Text Recognition Via Gated Cascade Attention

Alchemy: Techniques for Rectification Based Irregular Scene Text Recognition

Focus-Enhanced Scene Text Recognition with Deformable Convolutions