Abstract:In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise and anti-counterfeiting patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the FC layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset.

Multi-font printed Chinese character recognition using multi-pooling convolutional neural network

On Achieving Better Fault-Tolerant Capability for Recognizing Heavily Stained Printed Chinese Characters with a Four-Layer Fuzzy Neural Network

Chinese Character Captcha Recognition Based On Convolution Neural Network

Toward high-performance online HCCR: a CNN approach with DropDistortion, path signature and spatial stochastic max-pooling

Convolutional Neural Network for Machine-Printed Traditional Mongolian Font Recognition

Recognition of Chinese Characters Based on Multi-Scale Gradient and Deep Neural Network

CCRS: Web Service for Chinese Character Recognition

Chinese/English mixed Character Segmentation as Semantic Segmentation.

Chinese Character CAPTCHA Recognition and Performance Estimation Via Deep Neural Network

Content-independent font recognition on a single Chinese character using sparse representation

Principal Component 2-D Long Short-Term Memory for Font Recognition on Single Chinese Characters.

Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning

A Comprehensive Analysis of Misclassified Handwritten Chinese Character Samples by Incorporating Human Recognition

Improved Deep Convolutional Neural Network For Online Handwritten Chinese Character Recognition using Domain-Specific Knowledge

The CNN Based Machine-printed Traditional Mongolian Characters Recognition

FontRNN: Generating Large‐scale Chinese Fonts via Recurrent Neural Network

DropRegion training of inception font network for high-performance Chinese font recognition

Attention-based Deformable Convolutional Network for Chinese Various Dynasties Character Recognition.

Discovering similar Chinese characters in online handwriting with deep convolutional neural networks

Four-channel Convolutional Chinese Handwriting Recognition Based on MobileNetV2

Morphological Feature Aware Multi-CNN Model for Multilingual Text Recognition