Abstract:In the recognition of offline handwritten Chinese scripts, contextual post-processing plays a vital role in improving accuracy. In this paper, we systematically analyze the key factors that have an impact on the performance of contextual post-processing: statistical language models (LMs), candidate confidence, candidate set size, and search strategy. We then present a hybrid post-processing system, which integrates various kinds of information available. Next, we investigate seven LMs, four estimation methods of candidate confidence and different size of candidate set, and illustrate their influence on the performance of contextual post-processing in detail. Experimental results justify that the performance of the LMs are affected by training corpora size, smoothing method, and model pruning, and that lower perplexity correlates with a high accuracy. Comparing different estimation methods of candidate confidence shows that, it is vital to the contextual post-processing. We also show that allowing the correct characters to be captured in a limited number of candidates is extremely important for obtaining good post-processing performance. By adopting the hybrid post-processing, we can obtain high accuracy while paying attention to post-processing speed and memory space at the same time. It is shown that the average recognition accuracy of three Chinese scripts (about 66,000 characters in total) can reach 97.65%, which means 87% error correction rate in comparison with the 81.58% average accuracy before post-processing. In the end, we give some proposals for choosing a proper post-processing method for real script recognition tasks.

An OCR post-processing approach based on multi-knowledge

An Efficient Post-Processing Approach for Off-Line Handwritten Chinese Address Recognition

A Japanese OCR Post-Processing Approach Based on Dictionary Matching

An OCR post-processing method based on dictionary matching and matrix transforming

A Post-processing Approach for Handwritten Chinese Address Recognition

An Adaptive Post-processing Method using Proofreading Information for Chinese Character Recognition

Post-Processing Approach for Printed Chinese Character Recognition

OCR Result Optimization Based on Pattern Matching.

Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation

A Chinese OCR Spelling Check Approach Based on Statistical Language Models.

Unknown-box Approximation to Improve Optical Character Recognition Performance

A hybrid post-processing system for offline handwritten Chinese script recognition

Web Knowledge Base Improved Ocr Correction For Chinese Business Cards

A Multiplexed Network for End-to-End, Multilingual OCR

Survey of Post-OCR Processing Approaches

Statistical Learning for OCR Text Correction

A Cost Efficient Approach to Correct OCR Errors in Large Document Collections

Postprocessing Algorithm for the Optical Recognition of Degraded Characters

New Post-Processing Method Based on Noisy Channel Model for Chinese Character Recognition

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model