Task-driven single-image super-resolution reconstruction of document scans

Maciej Zyrek,Michal Kawulok
2024-07-12
Abstract:Super-resolution reconstruction is aimed at generating images of high spatial resolution from low-resolution observations. State-of-the-art super-resolution techniques underpinned with deep learning allow for obtaining results of outstanding visual quality, but it is seldom verified whether they constitute a valuable source for specific computer vision applications. In this paper, we investigate the possibility of employing super-resolution as a preprocessing step to improve optical character recognition from document scans. To achieve that, we propose to train deep networks for single-image super-resolution in a task-driven way to make them better adapted for the purpose of text detection. As problems limited to a specific task are heavily ill-posed, we introduce a multi-task loss function that embraces components related with text detection coupled with those guided by image similarity. The obtained results reported in this paper are encouraging and they constitute an important step towards real-world super-resolution of document images.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to improve the Optical Character Recognition (OCR) effect in document scanning through task - driven single - image super - resolution reconstruction**. Specifically, although existing Super - Resolution (SR) techniques can generate high - resolution images with excellent visual quality, it is rarely verified whether these techniques can provide valuable help for specific computer vision applications. The authors of this paper aim to explore using super - resolution as a pre - processing step to enhance the effect of text detection from document scanning, and for this purpose, propose a task - driven deep network training method to make these networks better adapt to text detection tasks. ### Core of the problem 1. **Limitations of existing SR techniques**: - Existing SR techniques are usually trained and verified on simulated data, resulting in their performance in real - world conditions being worse than expected. - In practical applications, when directly enhancing the original low - resolution images, the performance will decline significantly. 2. **Requirement for task - driven SR**: - In order to improve the effect of specific tasks (such as OCR), an SR method that can be optimized for this task is required. - Traditional SR methods mainly focus on image similarity and ignore task - related performance indicators. ### Solution The authors propose a multi - task loss function, which combines components related to text detection and components based on image similarity, to train the deep network in a dynamically balanced manner. In addition, they also introduce a self - supervised method, which automatically generates labels by processing high - resolution reference images, thus avoiding the complexity and cost of manual annotation. ### Main contributions 1. **Multi - task loss function**: A loss function composed of image similarity and text detection components is proposed, and the weights of each task are dynamically adjusted during the training process. 2. **Self - supervised training**: Labels are automatically generated by using HR reference images, realizing self - supervised training without manual annotation. 3. **Experimental verification**: Through extensive experimental research, it is shown that the proposed technique can significantly improve the accuracy of text detection after super - resolution reconstruction from document scanning. ### Conclusion This research shows that the task - driven super - resolution reconstruction method has great potential in improving the text detection effect, especially when dealing with real - world document scanning. This result provides an important reference direction for future related research.