Transformer-Convolution Network for Arbitrary Shape Text Detection

Yucheng Hu,Yuting Zhang,Wenxin Yu,Tianxiang Lan,Dong Yin
DOI: https://doi.org/10.1145/3523150.3523169
2022-01-15
Abstract:Arbitrary shape text detection is a prevalent topic in computer vision. Text instances in natural scenes may involve different sizes, different shapes, and complex background textures. Therefore, the ability to extract accurate text features becomes extremely significant for subsequent detection work. This paper proposes a novel Transformer-Convolution Network(TCNet) to participate in scene text detection task. TCNet contains two major modules named CNN module and Transformer module. CNN module is used to extract local features from the input images, while Transformer module establish connections among various local features. The two structures are complementary to each other, in particular, such combination between local features and relative locations contributes to a more precise detection, promoting the convergence and reducing the amount of parameters. Numerous experiments based on public datasets have demonstrated the excellent performance under the condition of sufficient data. specifically, in the case of small data, our method achieves the state-of-the-art performance whether in quadrilateral text or arbitrary shape text datasets.
What problem does this paper attempt to address?