Text Detection Based on Convolutional Neural Networks with Spatial Pyramid Pooling.

Rui Zhu,Xiao-Jiao Mao,Qi-Hai Zhu,Ning Li,Yu-Bin Yang
DOI: https://doi.org/10.1109/icip.2016.7532514
2016-01-01
Abstract:Text detection is a difficult task due to the significant diversity of the texts appearing in natural scene images. In this paper, we propose a novel text descriptor, SPP-net, extracted by equipping the Convolutional Neural Network (CNN) with spatial pyramid pooling. We first compute the feature maps from the original text lines without any cropping or warping, and then generate the fixed-size representations for text discrimination. Experimental results on the latest ICDAR 2011 and 2013 datasets have proven that the proposed descriptor outperforms the state-of-the-art methods by a noticeable margin on F-measure with its merit of incorporating multi-scale text information and its flexibility of describing text regions with different sizes and shapes.
What problem does this paper attempt to address?