A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion

Nianfeng Li,Zhenyan Wang,Yongyuan Huang,Jia Tian,Xinyuan Li,Zhiguo Xiao
DOI: https://doi.org/10.3390/s24123758
IF: 3.9
2024-06-10
Sensors
Abstract:Scene text detection is an important research field in computer vision, playing a crucial role in various application scenarios. However, existing scene text detection methods often fail to achieve satisfactory results when faced with text instances of different sizes, shapes, and complex backgrounds. To address the challenge of detecting diverse texts in natural scenes, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascaded feature fusion. This method combines global and local attention through an improved attention feature fusion module (DSAF) to capture text features of different scales, enhancing the network's perception of text regions and improving its feature extraction capabilities. Simultaneously, an improved cascaded feature fusion module (PFFM) is used to fully integrate the extracted feature maps, expanding the receptive field of features and enriching the expressive ability of the feature maps. Finally, to address the cascaded feature maps, a lightweight subspace attention module (SAM) is introduced to partition the concatenated feature maps into several sub-space feature maps, facilitating spatial information interaction among features of different scales. In this paper, comparative experiments are conducted on the ICDAR2015, Total-Text, and MSRA-TD500 datasets, and comparisons are made with some existing scene text detection methods. The results show that the proposed method achieves good performance in terms of accuracy, recall, and F-score, thus verifying its effectiveness and practicality.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
### The Problems This Paper Attempts to Solve This paper aims to address several key issues in text detection within natural scenes. Specifically: 1. **Challenges in Text Detection with Different Sizes, Shapes, and Complex Backgrounds**: Existing scene text detection methods often fail to achieve satisfactory detection results when dealing with text instances of varying sizes, shapes, and complex backgrounds. Therefore, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascade feature fusion. 2. **Insufficient Feature Extraction and Weak Perception Ability**: Some existing methods have deficiencies in feature extraction capabilities and perception abilities of text regions, which may lead to false detections or missed detections. To address these issues, this paper introduces an improved attention feature fusion module (DSAF), combining depthwise separable convolution and attention fusion mechanisms to enhance the network's ability to capture text features at different scales and improve its feature extraction capabilities. 3. **Insufficient Feature Fusion**: Although existing feature pyramid networks can effectively fuse extracted features, directly adding high and low-dimensional features for fusion can lead to information loss, affecting the text detection performance. To this end, this paper proposes an improved cascade feature fusion module (PFFM), which achieves sufficient fusion of features at different scales through a pyramid pooling module and cascade feature fusion, expanding the network's receptive field and enriching the expression capabilities of the feature maps. In summary, this paper proposes a new multi-scale natural scene text detection method to improve detection accuracy, recall rate, and F-score, thereby validating its effectiveness and practicality.