Granularity-aware Single-point Scene Text Spotting with Sequential Recurrence Self-attention

Xunquan Tong,Pengwen Dai,Xugong Qin,Rui Wang,Wenqi Ren
DOI: https://doi.org/10.1109/tcsvt.2024.3431993
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Scene text spotting, a unified framework between text detection and text recognition, has made great progress in recent years. Existing methods usually adopt the fully-supervised learning strategy, which relies on time-consuming location annotations, particularly for scene texts with arbitrary shapes. In this paper, we propose a weakly-supervised scene text spotting method via the location labels of single points with the corresponding text transcriptions. Due to the weak location annotations for challenging scene texts, previous weakly-supervised methods adopting the convolution neural network structure make it hard to model the different-scale text feature representations under blurring or nosing scenarios. In addition, as the single-point location can only cover part of the text instance, it will burden the confusion of sequential-like scene text recognition. To address these issues, we present a novel sequential recurrence self-attention for granularity-aware single-point scene text spotting. Specifically, we first enhance the scene text feature representations with different scales by integrating the global intra-interaction of high-level features with the low-level local features. Then, based on the granularity-aware text features, we decode them into text transcriptions in the sequential recurrence self-attention manner to capture the sequence-dependent relation in character-level semantics and locations. Extensive experiments show that our proposed method outperforms existing state-of-the-art weakly-supervised scene text spotters by a large margin.
What problem does this paper attempt to address?