Abstract:Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.

EMU: Effective Multi-Hot Encoding Net for Lightweight Scene Text Recognition with a Large Character Set.

Effective Multi-Hot Encoding and Classifier for Lightweight Scene Text Recognition with a Large Character Set

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

MEAN: Multi - Element Attention Network for Scene Text Recognition

UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception

Accurate and Efficient Scene Recognition with Compact Bow and Ensemble Elm

Boundary-aware Arbitrary-shaped Scene Text Detector with Learnable Embedding Network

Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling.

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Emu: Generative Pretraining in Multimodality

MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation

Scene Text Recognition with Sliding Convolutional Character Models

EMFANet: a lightweight network with efficient multi-scale feature aggregation for real-time semantic segmentation

A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application

Efficient scene text image super-resolution with semantic guidance

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection.

A Lightweight Multi-modal Emotion Recognition Network Based on Multi-task Learning

Scene Chinese Recognition with Local and Global Attention

A Multiplexed Network for End-to-End, Multilingual OCR

Efficient Multi-domain Text Recognition Deep Neural Network Parameterization with Residual Adapters

Mlts: A Multi-Language Scene Text Spotter