Open-Vocabulary Scene Text Recognition Via Pseudo-Image Labeling and Margin Loss

Xuhua Ren,Hengcan Shi,Jin Li
DOI: https://doi.org/10.48550/arxiv.2403.07518
2024-01-01
Abstract:Scene text recognition is an important and challenging task in computervision. However, most prior works focus on recognizing pre-defined words, whilethere are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we propose a novel open-vocabulary text recognition framework,Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lackof OOV training data. To solve this problem, we first propose a pseudo labelgeneration module that leverages character detection and image inpainting toproduce substantial pseudo OOV training data from real-world images. Unlikeprevious synthetic data, our pseudo OOV data contains real characters andbackgrounds to simulate real-world applications. Secondly, to reduce noises inpseudo data, we present a semantic checking mechanism to filter semanticallymeaningful data. Thirdly, we introduce a quality-aware margin loss to boost thetraining with pseudo data. Our loss includes a margin-based part to enhance theclassification ability, and a quality-aware part to penalize low-qualitysamples in both real and pseudo data. Extensive experiments demonstrate that our approach outperforms thestate-of-the-art on eight datasets and achieves the first rank in the ICDAR2022challenge.
What problem does this paper attempt to address?