Abstract:Optical degradation blurs text shapes and edges, so existing scene text recognition methods have difficulties in achieving desirable results on low-resolution (LR) scene text images acquired in realworld environments. The above problem can be solved by efficiently extracting sequential information to reconstruct super-resolution (SR) text images, which remains a challenging task. In this paper, we propose a Parallelly Contextual Attention Network (PCAN), which effectively learns sequence-dependent features and focuses more on high-frequency information of the reconstruction in text images. Firstly, we explore the importance of sequence-dependent features in horizontal and vertical directions parallelly for text SR, and then design a parallelly contextual attention block to adaptively select the key information in the text sequence that contributes to image super-resolution. Secondly, we propose a hierarchically orthogonal texture-aware attention module and an edge guidance loss function, which can help to reconstruct high-frequency information in text images. Finally, we conduct extensive experiments on TextZoom dataset, and the results can be easily incorporated into mainstream text recognition algorithms to further improve their performance in LR image recognition. Besides, our approach exhibits great robustness in defending against adversarial attacks on seven mainstream scene text recognition datasets, which means it can also improve the security of the text recognition pipeline. Compared with directly recognizing LR images, our method can respectively improve the recognition accuracy of ASTER, MORAN,and CRNN by 14.9%, 14.0%, and 20.1%. Our method outperforms eleven state-of-the-art (SOTA) SR methods in terms of boosting text recognition performance. Most importantly, it outperforms the current optimal text-orient SR method TSRN by 3.2%, 3.7%, and 6.0% on the recognition accuracy of ASTER, MORAN, and CRNN respectively.

NASTER: Non-local Attentional Scene Text Recognizer

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Scene Text Recognition with Cascade Attention Network.

Deep Neural Network with Attention Model for Scene Text Recognition.

A holistic representation guided attention network for scene text recognition

Character Region Awareness Network for Scene Text Recognition

Cascade 2D attentional decoders with context-enhanced encoder for scene text recognition

Hierarchical Refined Attention for Scene Text Recognition.

STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition

NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition

Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling.

Scene Text Recognition Via Gated Cascade Attention

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition

Scene Chinese Recognition with Local and Global Attention

SCATTER: Selective Context Attentional Scene Text Recognizer

Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network

A Feasible Framework for Arbitrary-Shaped Scene Text Recognition

Context-Based Contrastive Learning for Scene Text Recognition

HAFE: A Hierarchical Awareness and Feature Enhancement Network for Scene Text Recognition

GLaLT: Global-Local Attention-Augmented Light Transformer for Scene Text Recognition