EF<sup>2</sup>Net: Better Extracting, Fusing and Focusing Text Features for Scene Text Detection

Xiangyang Qu,Chongyang Zhang
DOI: https://doi.org/10.1109/AINIT59027.2023.10212684
2023-01-01
Abstract:Text detection in natural scene images is a chal-lenging task that requires localization and fitting of text regions. Currently, existing natural scene methods use fixed-size convolutional kernels to extract text instance features and have achieved good results. However, due to the extremely large aspect ratio of text regions in natural scenes, extracting features using fixed-size convolutional kernels introduces background noise, which affects the accuracy of text detection. In addition, complex backgrounds in natural scenes may cause text features in existing methods to be incorrectly detected as text, while small and ambiguous text may be missed in the detection. To address these challenges, first, we use a new backbone with multi-branch depth band convolution to better capture text features in large aspect ratios and multi-scale backgrounds. Then, we propose a novel FPN that can obtain detailed information and scale sequence features to enhance the feature information of small texts. Finally, we design a dynamic text detection head that combines a text detection head with three attention mechanisms. We perceive from three dimensions: scale, space, and channel, enhance multi-scale text region features, focus on foreground targets, and accurately locate text regions, finally achieving the effect of reducing false and missed detections. In conclusion, the method proposed in this paper achieves good performance in text detection tasks in natural scenes and solves some problems in existing methods. Experimental results show that our proposed model achieves a comprehensive surpass compared with the text detection baseline.
What problem does this paper attempt to address?