STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition

Minyi Zhao,Shijie Xuyang,Jihong Guan,Shuigeng Zhou
DOI: https://doi.org/10.1145/3581783.3612488
2023-01-01
Abstract:Though scene text recognition (STR) from high-resolution (HR) images has achieved significant success in the past years, text recognition from low-resolution (LR) images is still a challenging task. This inspires the study on scene text image super-resolution (STISR) to generate super-resolution (SR) images based on the LR images, then STR is performed on the generated SR images, which eventually boosts the recognition performance. However, existing methods have two major drawbacks: 1) STISR models may generate imperfect SR images, which mislead the subsequent recognition. 2) As the STISR models are optimized for high recognition accuracy, the fidelity of SR images may be degraded. Consequently, neither the recognition performance of STR nor the fidelity of STISR is desirable. In this paper, a novel model called STIRER (the abbreviation of Scene Text Image REcovery and Recognition) is proposed to effectively and simultaneously recover and recognize LR scene text images under a unified framework. Concretely, STIRER consists of a feature encoder to obtain pixel features and two dedicated decoders to generate SR images and recognize texts respectively based on the encoded features and the raw LR images. We propose a progressive scene text swin transformer architecture as the encoder to enrich the representations of the pixel features for better recovery and recognition. Extensive experiments on two LR datasets show the superiority of our model to the existing methods on recognition performance, super-resolution fidelity and computational cost. The STIRER Code is available in https://github.com/zhaominyiz/STIRER.
What problem does this paper attempt to address?