Abstract:Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to improve the Optical Character Recognition (OCR) effect in document scanning through task - driven single - image super - resolution reconstruction**. Specifically, although existing Super - Resolution (SR) techniques can generate high - resolution images with excellent visual quality, it is rarely verified whether these techniques can provide valuable help for specific computer vision applications. The authors of this paper aim to explore using super - resolution as a pre - processing step to enhance the effect of text detection from document scanning, and for this purpose, propose a task - driven deep network training method to make these networks better adapt to text detection tasks. ### Core of the problem 1. **Limitations of existing SR techniques**: - Existing SR techniques are usually trained and verified on simulated data, resulting in their performance in real - world conditions being worse than expected. - In practical applications, when directly enhancing the original low - resolution images, the performance will decline significantly. 2. **Requirement for task - driven SR**: - In order to improve the effect of specific tasks (such as OCR), an SR method that can be optimized for this task is required. - Traditional SR methods mainly focus on image similarity and ignore task - related performance indicators. ### Solution The authors propose a multi - task loss function, which combines components related to text detection and components based on image similarity, to train the deep network in a dynamically balanced manner. In addition, they also introduce a self - supervised method, which automatically generates labels by processing high - resolution reference images, thus avoiding the complexity and cost of manual annotation. ### Main contributions 1. **Multi - task loss function**: A loss function composed of image similarity and text detection components is proposed, and the weights of each task are dynamically adjusted during the training process. 2. **Self - supervised training**: Labels are automatically generated by using HR reference images, realizing self - supervised training without manual annotation. 3. **Experimental verification**: Through extensive experimental research, it is shown that the proposed technique can significantly improve the accuracy of text detection after super - resolution reconstruction from document scanning. ### Conclusion This research shows that the task - driven super - resolution reconstruction method has great potential in improving the text detection effect, especially when dealing with real - world document scanning. This result provides an important reference direction for future related research.

Task-driven single-image super-resolution reconstruction of document scans

Super-resolution Reconstruction Algorithms Based on Fusion of Deep Learning Mechanism and Wavelet

Pixel-Level Degradation for Text Image Super-Resolution and Recognition

Analysis and evaluation of Deep Learning based Super-Resolution algorithms to improve performance in Low-Resolution Face Recognition

Document Image Super-Resolution Reconstruction Based on Clustering Learning and Kernel Regression.

Single Remote Sensing Image Super-Resolution Via a Generative Adversarial Network with Stratified Dense Sampling and Chain Training

Scene text image super-resolution using multi-scale convolutional neural network with skip connections

MTSTR: Multi-task learning for low-resolution scene text recognition via dual attention mechanism and its application in logistics industry

Scene Text Image Super-Resolution in the Wild

Scene text image super-resolution via textual reasoning and multiscale cross-convolution

Scene Text Telescope: Text-Focused Scene Image Super-Resolution

Single Image Super Resolution Reconstruction Algorithm Based on Deep Learning

Text Gestalt: Stroke-Aware Scene Text Image Super-resolution

Super-Resolution Video Target Re-Recognition Based on Joint Training

Improving Text Image Resolution Using a Deep Generative Adversarial Network for Optical Character Recognition

Self-Supervised Memory Learning for Scene Text Image Super-Resolution

Scene Text Image Super-Resolution Via Parallelly Contextual Attention Network

Deep Learning Based Single Image Super-resolution: A Survey

TextSR: Content-Aware Text Super-Resolution Guided by Recognition

Iterative-in-Iterative Super-Resolution Biomedical Imaging Using One Real Image

Scene Text Image Super-Resolution via Content Perceptual Loss and Criss-Cross Transformer Blocks