Prototype extraction and adaptive OCR

Yihong Xu,G. Nagy
DOI: https://doi.org/10.1109/34.817408
IF: 23.6
1999-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:To maintain OCR accuracy with decreasing quality of page image composition, production, and digitization, it is essential to tune the system to each document. We propose a prototype extraction method for document-specific OCR systems. The method automatically generates training samples from unsegmented text images and the corresponding transcripts. It is tolerant of transcription errors, so a transcript produced automatically by an imperfect omnifont OCR system can be used. The method is based on new algorithms for estimating character widths, character locations in a word, and match/nonmatch probabilities from unsegmented text. An experimental word recognition system is designed and developed to combine prototype extraction algorithms and segmentation-free word recognition. The system can adapt itself to different page images and achieve high recognition accuracy on heavily degraded print.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?