Batch-transformer for scene text image super-resolution

Yaqi Sun,Xiaolan Xie,Zhi Li,Kai Yang
DOI: https://doi.org/10.1007/s00371-024-03598-7
IF: 2.835
2024-10-01
The Visual Computer
Abstract:Recognizing low-resolution text images is challenging as they often lose their detailed information, leading to poor recognition accuracy. Moreover, the traditional methods, based on deep convolutional neural networks (CNNs), are not effective enough for some low-resolution text images with dense characters. In this paper, a novel CNN-based batch-transformer network for scene text image super-resolution (BT-STISR) method is proposed to address this problem. In order to obtain the text information for text reconstruction, a pre-trained text prior module is employed to extract text information. Then a novel two pipeline batch-transformer-based module is proposed, leveraging self-attention and global attention mechanisms to exert the guidance of text prior to the text reconstruction process. Experimental study on a benchmark dataset TextZoom shows that the proposed method BT-STISR achieves the best state-of-the-art performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) metrics compared to some latest methods.
computer science, software engineering
What problem does this paper attempt to address?