Abstract:Scene text image super-resolution is an interesting and challenging task which aims to enhance the spatial resolution of low-resolution text images in the wild, and consequently improve the image visual quality and boost the performance of real-world text-related applications. However, most of previous super-resolution methods ignore the important specific characteristics of text patterns and regard scene text images as natural scene images. In this paper, a novel deep convolutional-based architecture is specifically proposed for the super-resolution of scene text images. In order to recover fine details of low-resolution characters, the proposed architecture has been carefully designed and its main specificities are three-fold: (1) the introduction of multi-scale features extraction by incorporating parallel convolutional layers in order to preserve both local and global high-frequency components that encapsulate the intricate details of characters' patterns. This strategy allows the proposed method to capture fine nuances in the visual representation of characters, enhancing the richness of extracted features. (2) the integration of skip connections through convolutional layers. This strategic design choice facilitates the seamless flow of information from lower to higher layers of the deep architecture, allowing sequential information about text patterns to be preserved more effectively. (3) the proposition of a specialized network in network-based reconstruction within our architecture to recover high-resolution text details from the collected features. Such a network paradigm minimizes information loss and enhances the proposed method's ability to discern and reconstruct fine textual details. These design elements collectively empower our super-resolution method to excel in analyzing fine text patterns for effective high-resolution reconstruction, providing a comprehensive solution for the challenging task of recovering fine details in low-resolution characters. Quantitative and qualitative evaluations on four well-known benchmarks, including the SVT, IIIT5k, IC03 and ICDAR2015-TextSR datasets, prove the efficiency of our proposal whose performance surpasses those of different state-of-the-art super-resolution methods.

Self-Supervised Memory Learning for Scene Text Image Super-Resolution

Scene text image super-resolution via textual reasoning and multiscale cross-convolution

SelFSR: Self-Conditioned Face Super-Resolution in the Wild Via Flow Field Degradation Network

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network

Scene text image super-resolution using multi-scale convolutional neural network with skip connections

MTSTR: Multi-task learning for low-resolution scene text recognition via dual attention mechanism and its application in logistics industry

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Pixel-Level Degradation for Text Image Super-Resolution and Recognition

Pragmatic degradation learning for scene text image super-resolution with data-training strategy

Scene Text Image Super-Resolution in the Wild

Efficient scene text image super-resolution with semantic guidance

Scene Text Telescope: Text-Focused Scene Image Super-Resolution

Multi-Source Deep Residual Fusion Network for Depth Image Super-resolution

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

High Quality Remote Sensing Image Super-Resolution Using Deep Memory Connected Network

CTE-Net: Contextual Texture Enhancement Network for Image Super-Resolution

Text Image Super-Resolution Guided by Text Structure and Embedding Priors

One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance