Abstract:Scene text detection refers to locating text regions in a scene image and marking them out with text boxes. With the rapid development of the mobile Internet and the increasing popularity of mobile terminal devices such as smartphones, the research on scene text detection technology has been highly valued and widely applied. In recent years, with the rise of deep learning represented by convolutional neural networks, research on scene text detection has made new developments. However, scene text detection is still a very challenging task due to the following two factors. Firstly, images in natural scenes often have complex backgrounds, which can easily interfere with the detection process. Secondly, the text in natural scenes is very diverse, with horizontal, skewed, straight, and curved text, all of which may be present in the same scene. As convolutional neural networks extract features, the convolutional layer with limited perceptual field cannot model the global semantic information well. Therefore, this paper further proposes a scene text detection algorithm based on dual-branch feature extraction. This paper enlarges the receptive field by means of a residual correction branch (RCB), to obtain contextual information with a larger receptive field. At the same time, in order to improve the efficiency of using the features, a two-branch attentional feature fusion (TB-AFF) module is proposed based on FPN, to combine global and local attention to pinpoint text regions, enhance the sensitivity of the network to text regions, and accurately detect the text location in natural scenes. In this paper, several sets of comparative experiments were conducted and compared with the current mainstream text detection methods, all of which achieved better results, thus verifying the effectiveness of the improved proposed method.

Dual Relation Network for Scene Text Recognition

Scene Text Detection and Recognition System for Visually Impaired People in Real World

Scene Text Recognition from Two-Dimensional Perspective

Scene Text Detection Based on Two-Branch Feature Extraction

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Scene Segmentation With Dual Relation-Aware Attention Network

Learning Dual Semantic Relations with Graph Attention for Image-Text Matching

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

Semantically Similarity-Wise Dual-Branch Network for Scene Graph Generation

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection.

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

DSRN: A Deep Scale Relationship Network for Scene Text Detection.

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

[Effect of quinones on enzymatic bioluminescence of NADH-dependent systems].

ReADS: A Rectified Attentional Double Supervised Network for Scene Text Recognition

Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation

Scene text recognition via dual character counting-aware visual and semantic modeling network

RSRNeT: a novel multi-modal network framework for named entity recognition and relation extraction

Scene Text Detection via Holistic, Multi-Channel Prediction