Abstract:Optical degradation blurs text shapes and edges, so existing scene text recognition methods have difficulties in achieving desirable results on low-resolution (LR) scene text images acquired in realworld environments. The above problem can be solved by efficiently extracting sequential information to reconstruct super-resolution (SR) text images, which remains a challenging task. In this paper, we propose a Parallelly Contextual Attention Network (PCAN), which effectively learns sequence-dependent features and focuses more on high-frequency information of the reconstruction in text images. Firstly, we explore the importance of sequence-dependent features in horizontal and vertical directions parallelly for text SR, and then design a parallelly contextual attention block to adaptively select the key information in the text sequence that contributes to image super-resolution. Secondly, we propose a hierarchically orthogonal texture-aware attention module and an edge guidance loss function, which can help to reconstruct high-frequency information in text images. Finally, we conduct extensive experiments on TextZoom dataset, and the results can be easily incorporated into mainstream text recognition algorithms to further improve their performance in LR image recognition. Besides, our approach exhibits great robustness in defending against adversarial attacks on seven mainstream scene text recognition datasets, which means it can also improve the security of the text recognition pipeline. Compared with directly recognizing LR images, our method can respectively improve the recognition accuracy of ASTER, MORAN,and CRNN by 14.9%, 14.0%, and 20.1%. Our method outperforms eleven state-of-the-art (SOTA) SR methods in terms of boosting text recognition performance. Most importantly, it outperforms the current optimal text-orient SR method TSRN by 3.2%, 3.7%, and 6.0% on the recognition accuracy of ASTER, MORAN, and CRNN respectively.

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

Improving Scene Text Image Super-resolution via Dual Prior Modulation Network

Gradient-Based Graph Attention for Scene Text Image Super-resolution.

Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network

ESTISR: Adapting Efficient Scene Text Image Super-resolution for Real-Scenes

Text-Enhanced Scene Image Super-Resolution via Stroke Mask and Orthogonal Attention

TextSRNet: Scene Text Super-Resolution Based on Contour Prior and Atrous Convolution.

Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Text Prior Guided Scene Text Image Super-Resolution

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition

Scene Text Telescope: Text-Focused Scene Image Super-Resolution

Scene Text Image Super-Resolution in the Wild

ESTGN: Enhanced Self-Mined Text Guided Super-Resolution Network for Superior Image Super Resolution.

Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement

A Benchmark for Chinese-English Scene Text Image Super-resolution

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution