Abstract:Texts on the intelligent transportation scene include mass information. Fully harnessing this information is one of the critical drivers for advancing intelligent transportation. Unlike the general scene, detecting text in transportation has extra demand, such as a fast inference speed, except for high accuracy. Most existing real-time text detection methods are based on the shrink mask, which loses some geometry semantic information and needs complex post-processing. In addition, the previous method usually focuses on correct output, which ignores feature correction and lacks guidance during the intermediate process. To this end, we propose an efficient multi-scene text detector that contains an effective text representation similar mask (SM) and a feature correction module (FCM). Unlike previous methods, the former aims to preserve the geometric information of the instances as much as possible. Its post-progressing saves 50$\%$ of the time, accurately and efficiently reconstructing text contours. The latter encourages false positive features to move away from the positive feature center, optimizing the predictions from the feature level. Some ablation studies demonstrate the efficiency of the SM and the effectiveness of the FCM. Moreover, the deficiency of existing traffic datasets (such as the low-quality annotation or closed source data unavailability) motivated us to collect and annotate a traffic text dataset, which introduces motion blur. In addition, to validate the scene robustness of the SM-Net, we conduct experiments on traffic, industrial, and natural scene datasets. Extensive experiments verify it achieves (SOTA) performance on several benchmarks. The code and dataset are available at: \url{<a class="link-external link-https" href="https://github.com/fengmulin/SMNet" rel="external noopener nofollow">this https URL</a>}.

Character-Level Street View Text Spotting Based on Deep Multisegmentation Network for Smarter Autonomous Driving

MT-SSD: Single-Stage 3D Object Detector Based on Magnification Transformation

Scene Text Detection and Recognition System for Visually Impaired People in Real World

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Text Detection in Scene Images Based on Exhaustive Segmentation

Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification

A Direct Regression Scene Text Detector with Position-Sensitive Segmentation

Multi-oriented Scene Text Detection via Corner Localization and Region Segmentation

Street View Text Recognition With Deep Learning for Urban Scene Understanding in Intelligent Transportation Systems

Scene Text Recognition Via Dual-path Network with Shape-driven Attention Alignment.

Accurate Scene Text Detection Via Scale-Aware Data Augmentation and Shape Similarity Constraint

Mlts: A Multi-Language Scene Text Spotter

Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

CVTD: A Robust Car-Mounted Video Text Detector

A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection.

Towards End-to-End Text Spotting in Natural Scenes

Detecting Text in the Wild with Deep Character Embedding Network

Character Region Awareness Network for Scene Text Recognition

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

MTSTR: Multi-task learning for low-resolution scene text recognition via dual attention mechanism and its application in logistics industry