Abstract:Optical degradation blurs text shapes and edges, so existing scene text recognition methods have difficulties in achieving desirable results on low-resolution (LR) scene text images acquired in realworld environments. The above problem can be solved by efficiently extracting sequential information to reconstruct super-resolution (SR) text images, which remains a challenging task. In this paper, we propose a Parallelly Contextual Attention Network (PCAN), which effectively learns sequence-dependent features and focuses more on high-frequency information of the reconstruction in text images. Firstly, we explore the importance of sequence-dependent features in horizontal and vertical directions parallelly for text SR, and then design a parallelly contextual attention block to adaptively select the key information in the text sequence that contributes to image super-resolution. Secondly, we propose a hierarchically orthogonal texture-aware attention module and an edge guidance loss function, which can help to reconstruct high-frequency information in text images. Finally, we conduct extensive experiments on TextZoom dataset, and the results can be easily incorporated into mainstream text recognition algorithms to further improve their performance in LR image recognition. Besides, our approach exhibits great robustness in defending against adversarial attacks on seven mainstream scene text recognition datasets, which means it can also improve the security of the text recognition pipeline. Compared with directly recognizing LR images, our method can respectively improve the recognition accuracy of ASTER, MORAN,and CRNN by 14.9%, 14.0%, and 20.1%. Our method outperforms eleven state-of-the-art (SOTA) SR methods in terms of boosting text recognition performance. Most importantly, it outperforms the current optimal text-orient SR method TSRN by 3.2%, 3.7%, and 6.0% on the recognition accuracy of ASTER, MORAN, and CRNN respectively.

An End-to-End OCR Text Re-organization Sequence Learning for Rich-Text Detail Image Comprehension

Scene Text Detection and Recognition System for Visually Impaired People in Real World

Reading Scene Text with Attention Convolutional Sequence Modeling

Beyond OCR + VQA: Towards End-to-End Reading and Reasoning for Robust and Accurate TextVQA

Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Improving OCR-based Image Captioning by Incorporating Geometrical Relationship

A holistic representation guided attention network for scene text recognition

Robust Scene Text Recognition Through Adaptive Image Enhancement

Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection

Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences

TextNet: Irregular Text Reading from Images with an End-to-End Trainable Network.

Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network

High-Resolution Image Classification with Rich Text Information Based on Graph Convolution Neural Network

A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Robust Scene Text Recognition with Automatic Rectification

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Scene text image super-resolution via textual reasoning and multiscale cross-convolution

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations

Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables.