Max-Pooling Based Scene Text Proposal for Scene Text Detection

Dinh Nguyen Van,Shijian Lu,Xiang Bai,Nizar Ouarti,Mounir Mokhtari
DOI: https://doi.org/10.1109/icdar.2017.213
2017-01-01
Abstract:Automatic reading texts in scenes is an attracting increasing interest in recent years due to various context awareness applications. Leverage on the advantages of object proposal in generic object detection, we propose a max-pooling based scene text proposal technique aiming for automatic extraction of texts in scenes. Given a scene image, a max-pooling based grouping technique is designed to search for scene text proposals within a feature map which is computed from image edges. Searched proposals are then ranked by a scoring function that is defined based on the histogram of oriented gradient. The proposed technique has been evaluated on two publicly available scene text datasets, including the ICDAR2015 dataset and the Street View Text (SVT) dataset. Experiments show that the proposed technique obtains superior proposal performance as compared with state-of-the-arts, especially when a small number of proposals is selected. In addition, it also obtains state-of-the-art scene text spotting when integrated with a scene text recognition model.
What problem does this paper attempt to address?