Abstract:Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.

Confusion network based Video OCR post-processing approach

Rejecting Character Recognition Errors Using CNN Based Confidence Estimation

Improved YOLOV5 Angle Embossed Character Recognition by Multiscale Residual Attention with Selectable Clustering

A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application

Levenshtein OCR

Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition

ConvPatchTrans: A script identification network with global and local semantics deeply integrated

Target Recognition Technology of Multimedia Platform Based on a Convolutional Neural Network

RCAT: Retentive CLIP Adapter Tuning for Improved Video Recognition

PP-OCRv2: Bag of Tricks for Ultra Lightweight OCR System

Gated Recurrent Convolution Neural Network for Ocr

Trunk-Branch Ensemble Convolutional Neural Networks for Video-Based Face Recognition

A novel art gesture recognition model based on two channel region-based convolution neural network for explainable human-computer interaction understanding

CC-LSTM: Cross and Conditional Long-Short Time Memory for Video Captioning

RANet: Ranking Attention Network for Fast Video Object Segmentation

Two-pathway attention network for real-time facial expression recognition

Combating Uncertainty and Class Imbalance in Facial Expression Recognition

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Focusing Attention: Towards Accurate Text Recognition in Natural Images

Research on Facial Expression Recognition Technology Based on Convolutional-Neural-Network Structure

UPOCR: Towards Unified Pixel-Level OCR Interface