Multi-granularity Deep Local Representations for Irregular Scene Text Recognition

Hongchao Gao,Yujia Li,Jiao Dai,Xi Wang,Jizhong Han,Ruixuan Li
DOI: https://doi.org/10.1145/3446971
2021-01-01
ACM/IMS Transactions on Data Science
Abstract:AbstractRecognizing irregular text from natural scene images is challenging due to the unconstrained appearance of text, such as curvature, orientation, and distortion. Recent recognition networks regard this task as a text sequence labeling problem and most networks capture the sequence only from a single-granularity visual representation, which to some extent limits the performance of recognition. In this article, we propose a hierarchical attention network to capture multi-granularity deep local representations for recognizing irregular scene text. It consists of several hierarchical attention blocks, and each block contains a Local Visual Representation Module (LVRM) and a Decoder Module (DM). Based on the hierarchical attention network, we propose a scene text recognition network. The extensive experiments show that our proposed network achieves the state-of-the-art performance on several benchmark datasets including IIIT-5K, SVT, CUTE, SVT-Perspective, and ICDAR datasets under shorter training time.
What problem does this paper attempt to address?