Enhancing Scene Text Recognition by Strengthening Attention Alignment

Lin Lin,Liangwei Chen,Shan Qing,Jiake Zhang
DOI: https://doi.org/10.1109/ICAIBD57115.2023.10206338
2023-05-26
Abstract:Scene text recognition (STR) refers to recognizing text instance images from natural scenes as text. STR has been a hot area of computer vision because it has various applications. The state-of-the-art methods are under the attention-based encoder-decoder framework. However, most of the attention-based methods usually suffer from the problem of attention drift in many situations. We found that a significant cause of attention drift is that the extracted visual features are insufficient. To alleviate this problem, we propose an attention-tuned model, which strengthens attention alignment by learning better visual features. We first adopt deformable convolution to extract good visual features. Next, a stacked self-attention architecture with intermediate supervision is responsible for refining the representation of visual features. Moreover, spatial position encoding is used to guide attention alignment operation. Extensive experiments on standard benchmarks demonstrate that the proposed model achieves state-of-the-art performance for regular and irregular scene text recognition.
Computer Science
What problem does this paper attempt to address?