A holistic representation guided attention network for scene text recognition

Lu Yang,Peng Wang,Hui Li,Zhen Li,Yanning Zhang
DOI: https://doi.org/10.1016/j.neucom.2020.07.010
IF: 6
2020-11-01
Neurocomputing
Abstract:<p>Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves <span class="math"><math>1.5×</math></span> to <span class="math"><math>9.4×</math></span> acceleration to backward pass and <span class="math"><math>1.3×</math></span> to <span class="math"><math>7.9×</math></span> acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?