Self-Supervised Memory Learning for Scene Text Image Super-Resolution

Kehua Guo,Xiangyuan Zhu,Gerald Schaefer,Rui Ding,Hui Fang
DOI: https://doi.org/10.1016/j.eswa.2024.125247
2024-01-01
Abstract:Computerised recognition of low-resolution scene text images has been a persistent challenge. To improve the recognition performance, image quality enhancement via image super-resolution technology provides an intuitive solution. Typical deep learning-based scene text image super-resolution methods assume that the image quality degradation from high-resolution images to their corresponding low-resolution counterparts can be represented by mapping well-distributed samples, which limits their reconstruction performance in a practical text recognition system. For real-world scenarios this assumption typically does not hold since image degradations arise from multiple sources during image capture and processing. In this paper, to alleviate this problem, we propose a novel self-supervised end-to-end memory network model for scene text image super- resolution. In particular, after extracting enriched and finer representations from low-resolution text images via a spatial refinement block, we introduce a memory-based network to yield an improved super-resolution model that can handle complex degradation sources. Furthermore, to boost the effectiveness of our method, we design a multi-term loss to exploit textual structure information, where, in addition to the traditional reconstruction loss, we embed a character perceptual loss and a boundary enhancement loss. Extensive experiments on different datasets demonstrate that our proposed MNTSR method effectively improves the recognition accuracy for several scene text image recognition models and achieves state-of-the-art results. The source code is made available at https://github.com/xyzhu1/MNTSR.
What problem does this paper attempt to address?