Abstract:For the multilingual interoperation in cross-country industrial systems, character recognition is a research issue that can largely facilitate the automatic information integration of an enormous number of forms, but has not been well resolved. Character recognition using the deep convolutional neural network depends on large scale training data collection and labor-intensive labeling work to train an effective model. Synthetic data generation and data augmentation are the typical means to compensate for the scarcity of labeled training data. However, the domain shift between synthetic data and real data inevitably results in unsatisfying recognition accuracy, bringing a significant challenge. To alleviate such an issue, a recognition system with enhanced two-phase transfer learning is proposed to utilize unlabeled real data in existing industrial forms. In the framework, massive training data are generated automatically with a configurable font and character library. A proposed convolutional neural network suitable for character recognition is pre-trained with the generated training data as the source model. In the first transfer phase, the source model is adapted to the target model with real samples of a specific writing style in an unsupervised manner. In the second supervised transfer phase, the target model is further optimized with a few labels available. The recognition application is described based on the target model. The effectiveness of the proposed enhanced two-phase model transfer method is validated on the public dataset as the target domain data through systematic experiments. Furthermore, a comparison with related works is provided to show the transferability and efficiency of the proposed framework.

Transferring General Multimodal Pretrained Models to Text Recognition

TrOCR: Transformer-Based Optical Character Recognition with Pre-trained Models

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

On the Hidden Mystery of OCR in Large Multimodal Models

Multimodal Pretraining from Monolingual to Multilingual

PP-OCR: A Practical Ultra Lightweight OCR System

Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

Multilingual Interoperation in Cross-Country Industry 4.0 System for One Belt and One Road

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Looking and Listening: Audio Guided Text Recognition

Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding.

SuperOCR: A Conversion from Optical Character Recognition to Image Captioning

DLoRA-TrOCR: Mixed Text Mode Optical Character Recognition Based On Transformer

OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models

OCR with a Convolutional Neural Networks Integration Model in Machine Vision

Towards Fast, Accurate and Compact Online Handwritten Chinese Text Recognition

Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning