Multi-oriented Scene Text Detector with Atrous Convolution

Di Pan,Fei Yu,Chunguo Li,Luxi Yang
DOI: https://doi.org/10.1109/ictc49638.2020.9123297
2020-01-01
Abstract:Recently, semantic segmentation has been widely used in text detection tasks and many excellent text detection methods have been proposed. They usually adopt deep convolutional neural network with consecutive striding or pooling operations to obtain a larger receptive field. However, that would lead to the lack of context which is crucial for text detection. In this paper, we propose an end-to-end trainable neural network to directly detect text regions without redundant stage other than a locality-aware non-maximum suppression is involved. We introduce atrous convolution in the backbone network to enlarge the receptive field, retaining more context information while controlling the spatial resolution of feature maps. The Atrous Spatial Pyramid Pooling (ASPP) module is attached on top of the feature maps to effectively detect texts of multiple scales. We have benchmarked our algorithm on three public datasets. It achieves highly competitive results in terms of text localization precision. More specifically, on the MSRA-TD500 datasets, the proposed algorithm achieves an F-score of 0.813, outperforming the previous best by a large margin.
What problem does this paper attempt to address?