Abstract:Scene text detection refers to locating text regions in a scene image and marking them out with text boxes. With the rapid development of the mobile Internet and the increasing popularity of mobile terminal devices such as smartphones, the research on scene text detection technology has been highly valued and widely applied. In recent years, with the rise of deep learning represented by convolutional neural networks, research on scene text detection has made new developments. However, scene text detection is still a very challenging task due to the following two factors. Firstly, images in natural scenes often have complex backgrounds, which can easily interfere with the detection process. Secondly, the text in natural scenes is very diverse, with horizontal, skewed, straight, and curved text, all of which may be present in the same scene. As convolutional neural networks extract features, the convolutional layer with limited perceptual field cannot model the global semantic information well. Therefore, this paper further proposes a scene text detection algorithm based on dual-branch feature extraction. This paper enlarges the receptive field by means of a residual correction branch (RCB), to obtain contextual information with a larger receptive field. At the same time, in order to improve the efficiency of using the features, a two-branch attentional feature fusion (TB-AFF) module is proposed based on FPN, to combine global and local attention to pinpoint text regions, enhance the sensitivity of the network to text regions, and accurately detect the text location in natural scenes. In this paper, several sets of comparative experiments were conducted and compared with the current mainstream text detection methods, all of which achieved better results, thus verifying the effectiveness of the improved proposed method.

EF<sup>2</sup>Net: Better Extracting, Fusing and Focusing Text Features for Scene Text Detection

A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Scene Text Detection Based on Two-Branch Feature Extraction

Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection

Natural scene text detection based on attention mechanism and deep multi-scale feature fusion

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection.

Enhanced EAST - Improving Network's Feature Extraction Ability and Text Complete Shape Perception.

HFENet: Hybrid Feature Enhancement Network for Detecting Texts in Scenes and Traffic Panels

EK-Net:Real-time Scene Text Detection with Expand Kernel Distance

FDTA: Fully Convolutional Scene Text Detection with Text Attention.

Bi-Directional Feature Fusion For Fast And Accurate Text Detection Of Arbitrary Shapes

Feature Pyramid Based Scene Text Detector

Text Detection And Recognition Algorithm For Arbitrary Shapes In Natural Scenes

A Multi-Level Feature Fusion Network for Scene Text Detection with Text Attention Mechanism

Feature Enhancement Network: A Refined Scene Text Detector.

EAST: An Efficient and Accurate Scene Text Detector

CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion

An End-to-End Scene Text Detector with Dynamic Attention.

A New Parallel Detection-Recognition Approach for End-to-End Scene Text Extraction.

TextFuseNet: Scene Text Detection with Richer Fused Features.