Abstract:Irregular text is widely used. However, it is considerably difficult to recognize because of its various shapes and distorted patterns. In this paper, we thus propose a multi-object rectified attention network (MORAN) for general scene text recognition. The MORAN consists of a multi-object rectification network and an attention-based sequence recognition network. The multi-object rectification network is designed for rectifying images that contain irregular text. It decreases the difficulty of recognition and enables the attention-based sequence recognition network to more easily read irregular text. It is trained in a weak supervision way, thus requiring only images and corresponding text labels. The attention-based sequence recognition network focuses on target characters and sequentially outputs the predictions. Moreover, to improve the sensitivity of the attention-based sequence recognition network, a fractional pickup method is proposed for an attention-based decoder in the training phase. With the rectification mechanism, the MORAN can read both regular and irregular scene text. Extensive experiments on various benchmarks are conducted, which show that the MORAN achieves state-of-the-art performance. The source code is available.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in scene text recognition, irregular texts are difficult to be accurately recognized due to their variable shapes and distortion patterns. Specifically, the paper proposes a Multi - Object Rectification Attention Network (MORAN) for general scene text recognition. MORAN consists of a Multi - Object Rectification Network (MORN) and an Attention - based Sequence Recognition Network (ASRN). MORN aims to rectify images containing irregular texts and reduce the recognition difficulty; while ASRN focuses on the target characters and outputs the prediction results sequentially. In addition, in order to improve the sensitivity of the attention - based sequence recognition network, the paper also proposes a score - picking method to optimize the attention decoder during the training phase. Through these mechanisms, MORAN can read regular and irregular scene texts and has achieved state - of - the - art performance in multiple benchmark tests. The main contributions of the paper include: 1. Proposing the MORAN framework for recognizing irregular scene texts. This framework contains a Multi - Object Rectification Network (MORN) and an Attention - based Sequence Recognition Network (ASRN). The images rectified by MORN are more easily recognized by ASRN. 2. MORN is trained in a weakly - supervised manner, which is flexible and not restricted by geometric constraints, and can rectify images with complex deformations. 3. Proposing a score - picking method for training the attention decoder in ASRN, which improves the robustness to context changes. 4. Proposing a curriculum learning strategy to enable MORAN to learn efficiently. Through training with this strategy, MORAN has surpassed existing methods on multiple standard text recognition benchmark datasets, including IIIT5K, SVT, ICDAR2003, ICDAR2013, ICDAR2015, SVT - Perspective and CUTE80 datasets.

A Multi-Object Rectified Attention Network for Scene Text Recognition

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

A Two-level Rectification Attention Network for Scene Text Recognition

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

A holistic representation guided attention network for scene text recognition

Multi-branch guided attention network for irregular text recognition

Robust Scene Text Recognition with Automatic Rectification

2D Attentional Irregular Scene Text Recognizer

A Simple and Strong Convolutional-Attention Network for Irregular Text Recognition

Robustly Recognizing Irregular Scene Text by Rectifying Principle Irregularities

Symmetry-constrained Rectification Network for Scene Text Recognition

Character Region Awareness Network for Scene Text Recognition

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

STAN: A sequential transformation attention-based network for scene text recognition

Deep Neural Network with Attention Model for Scene Text Recognition.

A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application

CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition

TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network.

Irregular Scene Text Detection Via Attention Guided Border Labeling.

A Feasible Framework for Arbitrary-Shaped Scene Text Recognition

Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network