EPAN: Effective Parts Attention Network for Scene Text Recognition

Yunlong Huang,Zenghui Sun,Lianwen Jin,Canjie Luo
DOI: https://doi.org/10.1016/j.neucom.2019.10.010
IF: 6
2020-01-01
Neurocomputing
Abstract:For most previous attention-based scene text recognition methods, images are transformed into highlevel feature vectors that form a feature map with height equal to one. Such vectors may contain unnecessary noise that limits recognition performance. To address this issue, in this paper, we propose the effective parts attention network (EPAN) which can attentively highlight the character region for more precise recognition. EPAN consists of a text image encoder and character effective parts decoder (CEPD), and it is end-to-end trainable. The former separates the high-dimensional feature map into one-dimensional vectors row-by-row, which are connected to a bidirectional long short term memory unit to encode contextual information. Subsequently, the CEPD transforms the vectors using a novel glimpse network at each time step to roughly determine the position of the characters. Then the CEPD uses a refinement network to generate a mask to gradually localize the precise position of important parts of the current character. Experiments were conducted on various benchmarks, including IIIT5K-Words, Street View Text, ICDAR 2003, ICDAR 2013, CUTE80, Street View Text Perspective, and ICDAR 2015, which demonstrated that the proposed EPAN method significantly outperformed or was comparable to existing methods in terms of lexicon-free word accuracy. Additionally, substantial qualitative results further demonstrated the robustness of our method. (C) 2019 The Authors. Published by Elsevier B.V.
What problem does this paper attempt to address?