Efficient and Accurate Text Detection Combining Differentiable Binarization with Semantic Segmentation

Yue Liu,Ying Shi,Chaojun Lin,Jie Hua,Ziqi Huang
DOI: https://doi.org/10.1007/978-3-031-15934-3_52
2022-01-01
Abstract:Recently, segmentation-based methods have quickly become themainstream in scene text detection, owing to their precise description of arbitraryshape texts. However, the reduced inference speed hinders the practical application of segmentation-based methods. In this paper, we propose an efficient and accurate arbitrary-shaped text detector named ViT-Bilateral DBNet, which improves the efficiency of feature processing approach to achieve a good tradeoff between accuracy and real-time performance. Specifically, we first combine Differentiable Binarization (DB) with real-time semantic segmentation BiSeNet V2 which is more suitable to process features for segmentation-based methods. Then three improvements are proposed to optimize the initial integrated network. ViT-Bilateral Network can strengthen the feature extracting capability of neural networks. Attention-driven Aggregation Layer (AAL) can adaptively fuse the details and the semantics achieved byViT-Bilateral Network. Meanwhile, the auxiliary loss is added to make the training more sufficient. Compared with original DBNet, our method not only gains 1.17% (on IC15) and 1.34% (on CTW 1500) improvements, but also runs 1.38 times and 1.34 times faster. Notably, our detector surpasses the previous best record and maintains a high inference speed.
What problem does this paper attempt to address?