Scene Text Recognition with Self-supervised Contrastive Predictive Coding

Xinzhe Jiang,Jianshu Zhang,Jun Du,Zhenrong Zhang,Jiajia Wu
DOI: https://doi.org/10.1109/ICPR56361.2022.9956631
2022-01-01
Abstract:Self-supervised visual pre-training has recently emerged in scene text recognition (STR), which designs the pretext tasks and takes unlabeled data as input to obtain useful representations for STR. However, most current self-supervised methods do not pay special attention to the importance of sequence awareness. Accordingly, we propose a novel self-supervised STR method based on contrastive predictive coding (STR-CPC), which regards a text instance as a sequence from left to right and captures the visual sequence correlation. Considering the information overlap problem within the feature map induced by the deep convolutional neural network (CNN) encoder, we design a widthwise causal convolution during model pre-training and a progressive recovery training strategy (PRTS) during model fine-tuning to improve the STR performance. Experiments on scene text show that our STR-CPC method outperforms the existing self-supervised methods, which testifies the advantage of visual sequence correlation for STR. Additionally, STR-CPC observably boosts performance compared with supervised training when the amount of labeled data decreases.
What problem does this paper attempt to address?