Abstract:Visually Impaired (VI) people around the world have difficulties in socializing and traveling due to the limitation of traditional assistive tools. In recent years, practical assistance systems for scene text detection and recognition allow VI people to obtain text information from surrounding scenes. However, real-world scene text features complex background, low resolution, variable fonts as well as irregular arrangement which make it difficult to achieve robust scene text detection and recognition. In this paper, a scene text recognition system to help VI people is proposed. Firstly, we propose a high-performance neural network to detect and track objects, which is applied to specific scenes to obtain Regions of Interest (ROI). In order to achieve real-time detection, a light-weight deep neural network has been built using depth-wise separable convolutions that enables the system to be integrated into mobile devices with limited computational resources. Secondly, we train the neural network using the textural features to improve the precision of text detection. Our algorithm suppresses the effects of spatial transformation (including translation, scaling, rotation as well as other geometric transformations) based on the spatial transformer networks. Open-source optical character recognition (OCR) is used to train scene texts individually to improve the accuracy of text recognition. The interactive system eventually transfers the number and distance information of inbound buses to visually impaired people. Finally, a comprehensive set of experiments on several benchmark datasets demonstrates that our algorithm has achieved an extraordinary trade-off between precision and resource usage.

MEAN: Multi - Element Attention Network for Scene Text Recognition

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

Deep Neural Network with Attention Model for Scene Text Recognition.

Scene Text Detection and Recognition System for Visually Impaired People in Real World

Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling.

Efficient Neural Network for Text Recognition in Natural Scenes Based on End-to-End Multi-Scale Attention Mechanism

EMU: Effective Multi-Hot Encoding Net for Lightweight Scene Text Recognition with a Large Character Set.

A holistic representation guided attention network for scene text recognition

A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection.

Text-Attentional Convolutional Neural Networks for Scene Text Detection

Scene Chinese Recognition with Local and Global Attention

Convolutional Attention Networks for Scene Text Recognition

Text-Attentional Convolutional Neural Network for Scene Text Detection

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

SGBANet: Semantic GAN and Balanced Attention Network for Arbitrarily Oriented Scene Text Recognition

Scene-text aware cross-modal retrieval based on semantic matching (ChinaMM2024)

Gaussian Constrained Attention Network for Scene Text Recognition

Efficient Scene Text Detection with Textual Attention Tower

Text proposals with location-awareness-attention network for arbitrarily shaped scene text detection and recognition