Specific category region proposal network for text detection in natural scene

Yuanhong Zhong,Xinyu Cheng,Zhaokun Zhou,Shun Zhang,Jing Zhang,Guan Huang
DOI: https://doi.org/10.1049/iet-ipr.2019.0652
IF: 2.3
2020-06-09
IET Image Processing
Abstract:Natural scene text usually carries considerable abstract semantic information, which is closely related to the surrounding environment. Thus, natural scene text detection plays a vital role in image content retrieval and understanding. In this study, the authors propose a novel specific category region proposal network (SCRPN) based on maximally stable extremal regions (MSER) and fully convolutional network (FCN) for natural scene text detection. First, FCN for pixel-level recognition is utilised to obtain the text saliency map and MSER is used to obtain oversegmented regions. Then, the multiple features of oversegmented regions and text saliency map are used for region aggregation. Next, single-linkage clustering method is adopted to cluster the segmentation regions to obtain a hierarchical structure of text region proposals. Finally, for the top-ranking region proposals, SCRPN built an end-to-end pipeline for scene text detection directly. Experiments on street view text and international conference on document analysis and recognition (ICDAR) 2013 have demonstrated the effectiveness of SCRPN for generating the text proposals. SCRPN could work with various two-stage text detection networks; thus, faster region convolutional neural network was used as the text detection framework to evaluate the performance of SCRPN in the ICDAR 2015 and MSRA-TD500 benchmarks. The experimental results confirmed that SCRPN makes text detection more robust in complex scenarios.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?