Abstract:The existing segmentation-based scene text detection methods mostly need complicated post-processing, and the post-processing operation is separated from the training process, which greatly reduces the detection performance. The previous method, DBNet, successfully simplified post-processing and integrated post-processing into a segmentation network. However, the training process of the model took a long time for 1200 epochs and the sensitivity to texts of various scales was lacking, leading to some text instances being missed. Considering the above two problems, we design the text detection Network with Binarization of Hyperbolic Tangent (HTBNet). First of all, we propose the Binarization of Hyperbolic Tangent (HTB), optimized along with which the segmentation network can expedite the initial convergent speed by reducing the number of epochs from 1200 to 600. Because features of different channels in the same scale feature map focus on the information of different regions in the image, to better represent the important features of all objects in the image, we devise the Multi-Scale Channel Attention (MSCA). Meanwhile, considering that multi-scale objects in the image cannot be simultaneously detected, we propose a novel module named Fused Module with Channel and Spatial (FMCS), which can fuse the multi-scale feature maps from channel and spatial dimensions. Finally, we adopt cross-entropy as the loss function, which measures the difference between predicted values and ground truths. The experimental results show that HTBNet, compared with lightweight models, has achieved competitive performance and speed on Total-Text (F-measure:86.0%, FPS:30) and MSRA-TD500 (F-measure:87.5%, FPS:30).

Scene Text Detection Using HRNet and Spatial Attention Mechanism

CRNet: A Center-aware Representation for Detecting Text of Arbitrary Shapes

Hierarchical Refined Attention for Scene Text Recognition.

Real-time Scene Text Detection Based on Global Level and Word Level Features

(HTBNet)Arbitrary Shape Scene Text Detection with Binarization of Hyperbolic Tangent and Cross-Entropy

A Multi-Level Feature Fusion Network for Scene Text Detection with Text Attention Mechanism

ContourNet: Taking a Further Step Toward Accurate Arbitrary-shaped Scene Text Detection.

Using of Attention for Scene Text Detection

R-Net: A Relationship Network for Efficient and Accurate Scene Text Detection

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection

A holistic representation guided attention network for scene text recognition

Deep Neural Network with Attention Model for Scene Text Recognition.

Text-Attentional Convolutional Neural Network for Scene Text Detection

Scene Text Detection with Fully Convolutional Neural Networks

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

ADNet: Rethinking the Shrunk Polygon-Based Approach in Scene Text Detection

High-speed Scene Text Detection with Attention and Multi-scale Label Generation

Character Region Awareness Network for Scene Text Recognition

SVTR-SRNet: A Deep Learning Model for Scene Text Recognition via SVTR Framework and Spatial Reduction Mechanism

Text-Attentional Convolutional Neural Networks for Scene Text Detection