A Dual-Path Transformer Network for Scene Text Detection

Jingyu Lin,Yan,Hanzi Wang
DOI: https://doi.org/10.1109/icassp49357.2023.10094842
2023-01-01
Abstract:The prosperity of deep learning contributes to the rapid progress of scene text detection. Among all the methods, segmentation-based methods have drawn extensive attention due to their superiority in detecting text instances of arbitrary shapes and extreme aspect ratios. However, the bottom-up methods are limited to the performance of their segmentation models. In this paper, we propose DPTNet (Dual-Path Transformer Network), a simple yet effective network to utilize both global and local information for the scene text detection task. Moreover, we propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues. In addition, a bi-directional interaction module across two paths is developed to provide complementary clues along the channel and spatial dimensions. Our DPTNet achieves state-of-the-art results on several standard benchmarks in terms of both detection accuracy and speed.
What problem does this paper attempt to address?