Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection.
Xurui Sun,Jiahao Lyu,Yifei Zhang,Gangyan Zeng,Bo Fang,Yu Zhou,Enze Xie,Can Ma
DOI: https://doi.org/10.1007/978-981-99-8540-1_1
2024-01-01
Abstract:As a fundamental step in most visual text-related tasks, scene text detection has been widely studied for a long time. However, due to the diversity in the foreground, such as aspect ratios, colors, shapes, etc., as well as the complexity of the background, scene text detection still faces many challenges. It is often difficult to obtain discriminative text-level features when dealing with overlapping text regions or ambiguous regions of adjacency, resulting in suboptimal detection performance. In this paper, we propose Text-specific Region Contrast (TRC) based on contrastive learning to enhance the features of text regions. Specifically, to formulate positive and negative sample pairs for contrast-based training, we divide regions in scene text images into three categories, i.e., text regions, backgrounds, and text-adjacent regions. Furthermore, we design a Text Multi-scale Strip Convolutional Attention module, called TextMSCA, to refine embedding features for precise contrast. We find that the learned features can focus on complete text regions and effectively tackle the ambiguity problem. Additionally, our method is lightweight and can be implemented in a plug-and-play manner while maintaining a high inference speed. Extensive experiments conducted on multiple benchmarks verify that the proposed method consistently improves the baseline with significant margins.