Flexible scene text recognition based on dual attention mechanism

Zhiqiang Tian,Chunhui Wang,Youzi Xiao,Yuping Lin
DOI: https://doi.org/10.1002/cpe.5863
2020-06-03
Concurrency and Computation: Practice and Experience
Abstract:<p>Scene text recognition (STR) is a very popular topic in the field of computer vision, which can extract text from complex natural scenes. In this article, we propose an end‐to‐end trainable and flexible STR method based on a dual attention mechanism. The proposed method consists of four modules: a thin plate spline transformer for normalizing the original image, a Channel‐Att feature extractor for obtaining representative features, a bidirectional long short‐term memory encoder for encoding sequential context features, and a Self‐Att based decoder for predicting text labels. The results on seven different benchmark datasets IIIT, SVT, IC03, IC13, IC15, SVTP, and CUTE, show that the proposed method is comparable to 13 existing methods. Especially, the average text recognition accuracy of the proposed method is about 1.4% higher than the state‐of‐the‐art method.</p>
What problem does this paper attempt to address?