Sequential Style Consistency Learning for Domain-Generalizable Text Recognition.

Pengcheng Zhang,Wenrui Liu,Ning Wang,Ran Shen,Gang Sun,Xinghua Jiang,Zheqian Chen,Fei wu,Zhou Zhao
DOI: https://doi.org/10.1007/978-981-99-8850-1_40
2024-01-01
Abstract:As a task aiming to recognize text from images, text recognition is of great significance in both industry and academia. The vast majority of existing text recognition methods use text images with the same styles as training and testing samples. However, when these models encounter images with new styles, their recognition accuracy will be significantly reduced. In this paper, we mainly explore Domain-Generalizable Text Recognition (DGTR), a challenging but meaningful setting focusing on enhancing the generalization ability of text recognition models. For this reason, we propose a practical framework called Sequential Style Consistency Learning (SSC), disentangling the style-specific and task-specific representation. Specifically, our SSC first constructs samples of augmented visual feature sequences, then disentangles the original and augmented feature sequences into style-specific features and task-specific features. To better separate the task-specific representation from the style-specific representation, the Style-Consistency Learning (SCL) is designed for learning the style consistency between original and augmented sequences. The disentangled module and style-consistency learning could provide complementary information for each other. Besides, our SSC is encouraged to meta-learn the style-specific and task-specific features during training based on text images with seen styles, generalizing better to text images with other styles. Numerous experiments and analyses conducted on the benchmark dataset MSDA have shown that SSC can achieve very competitive experimental results compared to state-of-the-art methods.
What problem does this paper attempt to address?