A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

Aishik Rakshit,Samyak Mehta,Anirban Dasgupta
2023-07-10
Abstract:Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritten texts have limited accuracy with existing techniques. The reason may be attributed to similar-looking characters and variations in handwritten characters. Since these issues are challenging to address with OCR technologies exclusively, we propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is the accuracy deficiency of Optical Character Recognition (OCR) technology when dealing with handwritten and printed text. Although existing OCR technology performs well on printed text such as license plates and store names, its accuracy is lower when processing textbooks and handwritten text due to factors like character similarity and handwriting variations. Therefore, the paper proposes a post-processing method based on Natural Language Processing (NLP) to improve the recognition accuracy of OCR. Specifically, the goal of the paper is to develop an end-to-end pipeline that first performs OCR processing on single-line handwritten or printed text, and then uses NLP techniques to post-process the OCR output to enhance its accuracy. This method aims to reduce errors in OCR output, particularly in cases of character shape similarity, font style and size variations, and inconsistent orientation. By doing so, it can significantly improve OCR performance in various application scenarios such as text summarization, part-of-speech tagging, sentence boundary detection, topic modeling, named entity recognition, and text classification.