Abstract:Scene text detection is a fundamental research work in the field of image processing and has extensive application value. Segmentation-based methods have time-consuming feature processing, while post-processing algorithms are excellent. Real-time semantic segmentation methods use lightweight backbone networks for feature extraction and aggregation but lack effective post-processing methods. The pure convolutional network improves model performance by changing key components. Combining the advantages of three types of methods, we propose a Pure Convolutional Bilateral Segmentation Network (PCBSNet) for real-time natural scene text detection. First, we constructed a bilateral feature extraction backbone network to significantly improve detection speed. The low extraction detail branch captures spatial information, while the efficient semantic extraction branch accurately captures semantic features through a series of micro designs. Second, we built an efficient attention aggregation module to guide the efficient and adaptive aggregation of features from the two branches. The fused feature map undergoes feature enhancement to obtain more accurate and reliable feature representation. Finally, we used differentiable binarization post-processing to construct text instance boundaries. To evaluate the effectiveness of the proposed model, we compared it with mainstream lightweight models on three datasets: ICDAR2015, MSRA-TD500, and CTW1500. The F-measure scores were 82.9%, 82.8%, and 78.9%, respectively, and the FPS were 59.1, 94.3, and 75.5 frames per second. We also conducted extensive ablation experiments on the ICDAR2015 dataset to validate the rationality of the proposed improvements. The obtained results indicate that the proposed model significantly improves inference speed while enhancing accuracy and demonstrates good competitiveness compared to other advanced detection methods. However, when faced with curved text, the detection performance of PCBSNet needs to be improved.

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection.

A Multi-Level Feature Fusion Network for Scene Text Detection with Text Attention Mechanism

A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion

Text-Attentional Convolutional Neural Networks for Scene Text Detection

Text-Attentional Convolutional Neural Network for Scene Text Detection

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

HFENet: Hybrid Feature Enhancement Network for Detecting Texts in Scenes and Traffic Panels

A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection.

Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection

Enhanced Multi-Scale Feature Adaptive Fusion Sparse Convolutional Network for Large-Scale Scenes Semantic Segmentation

CentripetalText: an Efficient Text Instance Representation for Scene Text Detection

Efficient Neural Network for Text Recognition in Natural Scenes Based on End-to-End Multi-Scale Attention Mechanism

A Unified Deep Neural Network For Scene Text Detection

Scene Text Detection with Fully Convolutional Neural Networks

CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition

PCBSNet: A Pure Convolutional Bilateral Segmentation Network for Real-Time Natural Scene Text Detection

Adaptive Segmentation Network for Scene Text Detection

Learning Markov Clustering Networks for Scene Text Detection

CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes

CPN: Complementary Proposal Network for Unconstrained Text Detection

Video Text Detection by Attentive Spatiotemporal Fusion of Deep Convolutional Features