Synthetically Supervised Feature Learning For Scene Text Recognition

Yang Liu,Zhaowen Wang,Hailin Jin,Ian J. Wassell
DOI: https://doi.org/10.1007/978-3-030-01228-1_27
2018-01-01
Abstract:We address the problem of image feature learning for scene text recognition. The image features in the state-of-the-art methods are learned from large-scale synthetic image datasets. However, most methods only rely on outputs of the synthetic data generation process, namely realistically looking images, and completely ignore the rest of the process. We propose to leverage the parameters that lead to the output images to improve image feature learning. Specifically, for every image out of the data generation process, we obtain the associated parameters and render another "clean" image that is free of select distortion factors that are applied to the output image. Because of the absence of distortion factors, the clean image tends to be easier to recognize than the original image which can serve as supervision. We design a multi-task network with an encoder-discriminator-generator architecture to guide the feature of the original image toward that of the clean image. The experiments show that our method significantly outperforms the state-of-the-art methods on standard scene text recognition benchmarks in the lexicon-free category. Furthermore, we show that without explicit handling, our method works on challenging cases where input images contain severe geometric distortion, such as text on a curved path.
What problem does this paper attempt to address?