Improved CTPN Based Attention Mechanism for Scene Text Detection

Qian Sun,Peng Ji,Zhong-yue Xiao
DOI: https://doi.org/10.1109/ICBAR58199.2022.00045
2022-11-01
Abstract:Text, which can be seen everywhere in the natural environment, is one of the main ways to convey and communicate messages among people. In recent years, text detection and recognition techniques in natural scene images have become a hot research topic in the fields of computer vision, natural language processing, and instant translation. Global contextual information is crucial for natural scene text detection. However, most existing approaches rely on convolutional neural networks, and direct access to global contextual information is challenging due to the local nature of convolutional operations. Inspired by Swin-Transformer which has powerful global modeling capability, this paper proposes an improved CTPN model with Swin-Transformer instead of VGG16 as the backbone network for natural scene text detection, and uses FPN feature pyramids for feature fusion of different sizes to improve detection accuracy. On the ICDAR2015 dataset, the improved CTPN model incorporating Swin-Transformer improves the accuracy, recall, and F1 values by 1.3%, 3.3%, and 2.4%, respectively, compared with the CTPN model. The experimental results show that the improved CTPN algorithm incorporating Swin-Transformer can effectively and accurately detect the text location in natural scenes, laying a solid foundation for later text recognition and lab sheet interpretation.
Computer Science
What problem does this paper attempt to address?