Efficient Backbone Search for Scene Text Recognition

Hui Zhang,Quanming Yao,Mingkun Yang,Yongchao Xu,Xiang Bai
DOI: https://doi.org/10.1007/978-3-030-58586-0_44
2020-01-01
Abstract:Scene text recognition (STR) is very challenging due to the diversity of textinstances and the complexity of scenes. The community has paid increasingattention to boost the performance by improving the pre-processing imagemodule, like rectification and deblurring, or the sequence translator. However,another critical module, i.e., the feature sequence extractor, has not beenextensively explored. In this work, inspired by the success of neuralarchitecture search (NAS), which can identify better architectures thanhuman-designed ones, we propose automated STR (AutoSTR) to searchdata-dependent backbones to boost text recognition performance. First, wedesign a domain-specific search space for STR, which contains both choices onoperations and constraints on the downsampling path. Then, we propose atwo-step search algorithm, which decouples operations and downsampling path,for an efficient search in the given space. Experiments demonstrate that, bysearching data-dependent backbones, AutoSTR can outperform the state-of-the-artapproaches on standard benchmarks with much fewer FLOPS and model parameters.
What problem does this paper attempt to address?