A quadrilateral scene text detector with two-stage network architecture

Siwei Wang,Yudong Liu,Zheqi He,Yongtao Wang,Zhi Tang
DOI: https://doi.org/10.1016/j.patcog.2020.107230
IF: 8
2020-01-01
Pattern Recognition
Abstract:Many of the state-of-the-art methods can only localize scene texts with rotated rectangle boundaries, which may result in incorrect rectification of the detected scene texts and erroneous elimination of proposals or detections during non-maximum suppression (NMS). A few existing methods that can detect scene texts with quadrilateral boundaries, are just based on one-stage architectures or sliding windows scanning and thus have sub-optimal performance. To address these problems, we propose an end-to-end two-stage network architecture for scene text detection, which can accurately localize scene texts with quadrilateral boundaries. At the first stage, we propose a quadrilateral region proposal network (QRPN) for generating quadrilateral proposals, based on a newly proposed quadrilateral regression algorithm. At the second stage, we introduce a novel weighted RoI pooling module with learned weight masks to pool the features, and then classify the proposals and refine their shapes with the proposed quadrilateral regression algorithm again. Specially, during training, we adopt a dual-branch structure of detection heads, that is, jointly train the quadrilateral detection head and an additional rotated rectangle detection head. Furthermore, we develop an accelerated NMS algorithm with O(nlogn) complexity, for redundant quadrilateral text proposals and detections eliminating during the first and the second stage, respectively. Experiments on several challenging benchmarks demonstrate the superior performance of the proposed method, which achieves state-of-the-art results on widely used benchmarks ICDAR 2017 MLT, RCTW, and ICDAR 2015 Incidental Scene Text benchmark. (C) 2020 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?