Self-Training for Domain Adaptive Scene Text Detection

Yudi Chen,Wei Wang,Yu Zhou,Fei Yang,Dongbao Yang,Weiping Wang
DOI: https://doi.org/10.1109/icpr48806.2021.9412558
2020-01-01
Abstract:Though deep learning based scene text detection has achieved great progress,well-trained detectors suffer from severe performance degradation for differentdomains. In general, a tremendous amount of data is indispensable to train thedetector in the target domain. However, data collection and annotation areexpensive and time-consuming. To address this problem, we propose aself-training framework to automatically mine hard examples with pseudo-labelsfrom unannotated videos or images. To reduce the noise of hard examples, anovel text mining module is implemented based on the fusion of detection andtracking results. Then, an image-to-video generation method is designed for thetasks that videos are unavailable and only images can be used. Experimentalresults on standard benchmarks, including ICDAR2015, MSRA-TD500, ICDAR2017 MLT,demonstrate the effectiveness of our self-training method. The simple MaskR-CNN adapted with self-training and fine-tuned on real data can achievecomparable or even superior results with the state-of-the-art methods.
What problem does this paper attempt to address?