Abstract:Optical degradation blurs text shapes and edges, so existing scene text recognition methods have difficulties in achieving desirable results on low-resolution (LR) scene text images acquired in realworld environments. The above problem can be solved by efficiently extracting sequential information to reconstruct super-resolution (SR) text images, which remains a challenging task. In this paper, we propose a Parallelly Contextual Attention Network (PCAN), which effectively learns sequence-dependent features and focuses more on high-frequency information of the reconstruction in text images. Firstly, we explore the importance of sequence-dependent features in horizontal and vertical directions parallelly for text SR, and then design a parallelly contextual attention block to adaptively select the key information in the text sequence that contributes to image super-resolution. Secondly, we propose a hierarchically orthogonal texture-aware attention module and an edge guidance loss function, which can help to reconstruct high-frequency information in text images. Finally, we conduct extensive experiments on TextZoom dataset, and the results can be easily incorporated into mainstream text recognition algorithms to further improve their performance in LR image recognition. Besides, our approach exhibits great robustness in defending against adversarial attacks on seven mainstream scene text recognition datasets, which means it can also improve the security of the text recognition pipeline. Compared with directly recognizing LR images, our method can respectively improve the recognition accuracy of ASTER, MORAN,and CRNN by 14.9%, 14.0%, and 20.1%. Our method outperforms eleven state-of-the-art (SOTA) SR methods in terms of boosting text recognition performance. Most importantly, it outperforms the current optimal text-orient SR method TSRN by 3.2%, 3.7%, and 6.0% on the recognition accuracy of ASTER, MORAN, and CRNN respectively.

STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition

One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance

ESTISR: Adapting Efficient Scene Text Image Super-resolution for Real-Scenes

C3-STISR: Scene Text Image Super-resolution with Triple Clues.

Scene Text Image Super-resolution based on Text-conditional Diffusion Models

Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution

Text Prior Guided Scene Text Image Super-Resolution

Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement

A Benchmark for Chinese-English Scene Text Image Super-resolution

HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution

Scene Text Telescope: Text-Focused Scene Image Super-Resolution

Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network

SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor

Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing

Improving Scene Text Image Super-resolution via Dual Prior Modulation Network

Scene Text Image Super-Resolution in the Wild

TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution

Gradient-Based Graph Attention for Scene Text Image Super-resolution.

Orientation-Independent Chinese Text Recognition in Scene Images