Abstract:Optical character recognition (OCR) is the process of acquiring text and layout information through analysis and recognition of text data image files. It is also a process to identify the geometric location and orientation of the texts and their symmetrical behavior. It usually consists of two steps: text detection and text recognition. Scene text recognition is a subfield of OCR that focuses on processing text in natural scenes, such as streets, billboards, license plates, etc. Unlike traditional document category photographs, it is a challenging task to use computer technology to locate and read text information in natural scenes. Imaging sequence recognition is a longstanding subject of research in the field of computer vision. Great progress has been made in this field; however, most models struggled to recognize text in images of complex scenes with high accuracy. This paper proposes a new pattern of text recognition based on the convolutional recurrent neural network (CRNN) as a solution to address this issue. It combines real-time scene text detection with differentiable binarization (DBNet) for text detection and segmentation, text direction classifier, and the Retinex algorithm for image enhancement. To evaluate the effectiveness of the proposed method, we performed experimental analysis of the proposed algorithm, and carried out simulation on complex scene image data based on existing literature data and also on several real datasets designed for a variety of nonstationary environments. Experimental results demonstrated that our proposed model performed better than the baseline methods on three benchmark datasets and achieved on-par performance with other approaches on existing datasets. This model can solve the problem that CRNN cannot identify text in complex and multi-oriented text scenes. Furthermore, it outperforms the original CRNN model with higher accuracy across a wider variety of application scenarios.

Post-OCR Paragraph Recognition by Graph Convolutional Networks

Gated Recurrent Convolution Neural Network for Ocr

Characters as Graphs: Recognizing Online Handwritten Chinese Characters via Spatial Graph Convolutional Network

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

OCR using CRNN: A Deep Learning Approach for Text Recognition

CNN Based Page Object Detection in Document Images

OCR with a Convolutional Neural Networks Integration Model in Machine Vision

Efficient Document Image Classification Using Region-Based Graph Neural Network

GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System

A Convolutional Recurrent Neural-Network-Based Machine Learning for Scene Text Recognition Application

Graph Convolutional Networks for Classification with a Structured Label Space

Multilabel Recognition Algorithm With Multigraph Structure

CaptchaGG: A Linear Graphical CAPTCHA Recognition Model Based on CNN and RNN

Continual Graph Convolutional Network for Text Classification

R-GNN: recurrent graph neural networks for font classification of oracle bone inscriptions

Novel GCN Model Using Dense Connection and Attention Mechanism for Text Classification

Multi-Label Image Recognition With Graph Convolutional Networks

GazeGCN: Gaze-aware Graph Convolutional Network for Text Classification

CGNN: Caption-assisted Graph Neural Network for Image-Text Retrieval

PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis

Irregular Scene Text Detection Based on a Graph Convolutional Network