ESRNet: an exploring sample relationships network for arbitrary-shaped scene text detection

Huageng Fan,Tongwei Lu
DOI: https://doi.org/10.1007/s10489-024-05773-8
IF: 5.3
2024-09-20
Applied Intelligence
Abstract:Recently transformer-based scene text detection methods have been gradually investigated. However, these methods usually use attention to model visual content relationships in single sample, ignoring the relationships between samples. Exploring sample relationships enables feature propagation between samples, which facilitates detector to detect scene text images with more complex features. Aware of the challenges above, this paper proposes exploring sample relationships network (ESRNet) for detecting arbitrary-shaped texts. In detail, we construct the exploring sample relationships module (ESRM) to model sample relationships in the encoder, capturing interactions between all samples in each batch and propagating features across samples. Because of the inconsistency in batch sizes for training and testing leads to differences in exploring sample relationships between these two phases, so two-stream encoder method is used to solve the problem. Moreover, we propose location-aware factorized self-attention (LAFSA), which incorporates the sequential information of text polygon control points into the modeling and effectively improves the accuracy of label reading order in terms of visual features. Experimental results on multiple datasets demonstrate that ESRNet exhibits superior performance compared to other methods. Notably, ESRNet achieves F-measure of 88.9 , 88.4 , and 77.4 on the Total-Text, CTW1500, and ArT datasets, respectively.
computer science, artificial intelligence
What problem does this paper attempt to address?