Text Position-Aware Pixel Aggregation Network with Adaptive Gaussian Threshold: Detecting Text in the Wild

Jiayu Xu,Ailiang Lin,Jinxing Li,Guangming Lu
DOI: https://doi.org/10.1109/tcsvt.2023.3285096
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Over recent years, deep learning has significantly boosted scene text detection performance, and current segmentation-based scene text detectors can achieve compact bounding boxes for irregular texts. However, it is also challenging to tackle crowded or overlapping texts for these existing methods due to conglutination between adjacent text instances in segmentation results. To address these issues, we propose a more accurate scene text detector, Text Position-Aware Pixel Aggregation Network, termed TPPAN. Specifically, a Gaussian threshold representation is adaptively learned instead of a constant setting in Adaptively Text Kernel Thresholding (ATKT) module to obtain more accurate text kernels. Then Text Position-Aware Region Pixel Aggregation (TPAR-PA) module predicts the text regions in relative positions and generates more accurate text contours. Adequate experiments have demonstrated that the resulting detector has achieved state-of-the-art performance on multi-oriented and curved scene text benchmarks.
engineering, electrical & electronic
What problem does this paper attempt to address?