Abstract:Visually Impaired (VI) people around the world have difficulties in socializing and traveling due to the limitation of traditional assistive tools. In recent years, practical assistance systems for scene text detection and recognition allow VI people to obtain text information from surrounding scenes. However, real-world scene text features complex background, low resolution, variable fonts as well as irregular arrangement which make it difficult to achieve robust scene text detection and recognition. In this paper, a scene text recognition system to help VI people is proposed. Firstly, we propose a high-performance neural network to detect and track objects, which is applied to specific scenes to obtain Regions of Interest (ROI). In order to achieve real-time detection, a light-weight deep neural network has been built using depth-wise separable convolutions that enables the system to be integrated into mobile devices with limited computational resources. Secondly, we train the neural network using the textural features to improve the precision of text detection. Our algorithm suppresses the effects of spatial transformation (including translation, scaling, rotation as well as other geometric transformations) based on the spatial transformer networks. Open-source optical character recognition (OCR) is used to train scene texts individually to improve the accuracy of text recognition. The interactive system eventually transfers the number and distance information of inbound buses to visually impaired people. Finally, a comprehensive set of experiments on several benchmark datasets demonstrates that our algorithm has achieved an extraordinary trade-off between precision and resource usage.

Online Scene Text Tracking with Spatial-Temporal Relation

Video Text Tracking With a Spatio-Temporal Complementary Model

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Research on Student Group Tracking Algorithm Based on Teaching Scene

Robust Visual Tracking Via CAMShift and Structural Local Sparse Appearance Model

Scene Text Detection and Recognition System for Visually Impaired People in Real World

A New Technique for Multi-Oriented Scene Text Line Detection and Tracking in Video

Scene Text Detection and Tracking in Video with Background Cues

Tracking Based Multi-Orientation Scene Text Detection: A Unified Framework With Dynamic Programming.

A Research on Video Text Tracking and Recognition

Detecting both superimposed and scene text with multiple languages and multiple alignments in video

Modeling of Multiple Spatial-Temporal Relations for Robust Visual Object Tracking

An Online Approach: Learning-Semantic-Scene-By-Tracking And Tracking-By-Learning-Semantic-Scene

Automatic video superimposed text detection based on Nonsubsampled Contourlet Transform

Video text rediscovery: Predicting and tracking text across complex scenes

Text Detection Using Delaunay Triangulation in Video Sequence

A Robust Approach for Scene Text Detection and Tracking in Video.

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

End-to-end video text detection with online tracking

Cross-View People Tracking by Scene-Centered Spatio-Temporal Parsing.

Tracking Based Semi-Automatic Annotation for Scene Text Videos