A Graph-Transformer Network for Scene Text Detection.

Yongrong Wu,Jingyu Lin,Houjin Chen,Dinghao Chen,Lvqing Yang,Jianbing Xiahou
DOI: https://doi.org/10.1007/978-981-99-4761-4_57
2023-01-01
Abstract:Detecting text in natural images with varying orientations and shapes is challenging. Existing detectors often fail with text instances having extreme aspect ratios. This paper introduces GTNet, a Graph- Transformer network for scene text detection. GTNet uses a Graph-based Shared Feature Learning Module (GSFL) for feature extraction and a Transformer-based Regression Module (TRM) for bounding box prediction. Our architecture offers a flexible receptive field, combining global attention and local features for enhanced text representation. Extensive experiments show our method surpasses existing detectors in accuracy and effectiveness.
What problem does this paper attempt to address?