A Unified Deep Neural Network For Scene Text Detection

Yixin Li,Jinwen Ma
DOI: https://doi.org/10.1007/978-3-319-63309-1_10
2017-01-01
Abstract:Scene text detection is important and valuable for text recognition in natural scenes, but it is still a very challenging problem. In this paper, we propose a unified deep neural network for scene text detection, which is composed of a Fully Convolutional Network (FCN) for text saliency map generation and a Bounding box Regression Network (BRN) for text bounding boxes prediction. The FCN is trained with a hybrid loss function based on two types of pixel-wise ground truth masks while the unified neural network is fine-tuned with a multitask loss function. Additionally, the post-processing procedures including scoring the predicted bounding boxes by the saliency map and eliminating the redundant boxes via the Non-Maximum Suppression (NMS) method are applied to improve the final text detection results. It is demonstrated by the experimental results on ICDAR2013 benchmark that our proposed unified deep neural network can achieve good performance of text detection and process images at 5 fps, being faster than most of the existing text detection methods.
What problem does this paper attempt to address?