Scene Text Detection with Feature Pyramid Network and Linking Segments

Xi Liu,Rui Zhang,Yongsheng Zhou,Dong Wang
DOI: https://doi.org/10.1109/icdar.2019.00087
2019-01-01
Abstract:Scene text detection is one of the most challenging problems in computer vision and has attracted great interest. Different from generic object detection, scene text detection mainly suffers from the large variance of scale, aspect ratio, and orientation in scene text. In this paper, we propose an effective and efficient model (SEG-FPN) for scene text detection, which is based on Feature Pyramid Network (FPN) and Linking Segments (SegLink). We incorporate feature pyramid mechanism with Single Shot Detector (SSD) framework to deal with different scale texts, and link locally detectable elements to detect texts of different orientations and aspect ratios. Moreover, compared with SSD, we enlarge the feature map of deep layers to better localize the large texts and recognize the small texts accurately. Experiments on ICDAR2015 and ICDAR2013 datasets demonstrate that our method can achieve comparable performance in terms of both accuracy and time. Specifically, SEG-FPN achieves an f-measure of 0.820 at 10.3 fps for 1280*768 ICDAR 2015 Incidental text images, and an f-measure of 0.879 at 19.2 fps for 512*512 ICDAR 2013 focused scene text images.
What problem does this paper attempt to address?