Abstract:Everything becomes smart in the modern era, for everything we need a better plan or arrangements. In the olden days, essential information was noted as a document with the help of paper and pen or printed texts. But the intelligent world needs a paperless environment by converting handwritten or printed text documents into software copies. This can be achieved by the electronic data conversion concept called Optical Character Recognition (OCR). OCR of some documents is complex because of different writing styles and quality of scanned image issues, which can be solved by adopting a deep learning technique for better accuracy. We employed Long Short Term Memory (LSTM) for English Optical Character Recognition for paperless and effortless data storage and fast access in this work. Still, the records may contain the entities like names, contact details, drug details, diseases, educational qualifications, dates, etc. These entities cannot be separated by employing OCR alone; we need an entity recognition framework for deeper and faster data analysis. For efficient Named Entity Recognition, we utilize the Adaptive Fuzzy Inference System (ANFIS) powered by the algorithms CRF and BERT to automatically labels each entity by training the vast amount of unlabeled text data. The ANFIS model is equipped with both linguistic and numerical knowledge. It is more accurate than the ANN when it comes to identifying patterns and classification data. Also, it is more transparent to the user. Our proposed framework aims to improve the performance of the character recognition system by using a feed-forward network. One of the main issues that have been identified in the development of this system is noise. Through this network, we can provide a single input and one output layer. The main components of the system are the training and recognition sections. These two sections are mainly focused on image acquisition and feature extraction. Besides these, they also include training and simulation of the classifier. The first step in the process of image recognition is to extract the features from the normalized image matrix. We then train the network using a proposed training algorithm. Experimentation on medical records attains a higher accuracy value of 0.9637, recall value of 0.9627, and f1 score of 0.9627, respectively.

A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

Advanced Digital Image Processing Technique based Optical Character Recognition of Scanned Document

Survey of Post-OCR Processing Approaches

Handwritten optical character recognition using TransRNN trained with self improved flower pollination algorithm (SI-FPA)

A Novel Approach to Printed Arabic Optical Character Recognition

A Novel Approach to Skew-Detection and Correction of English Alphabets for OCR

CNN-Bidirectional LSTM Based Optical Character Recognition of Sanskrit Manuscripts : A Comprehensive Systematic Literature Review

Unknown-box Approximation to Improve Optical Character Recognition Performance

Handwritten OCR for Indic Scripts: A Comprehensive Overview of Machine Learning and Deep Learning Techniques

OCR using CRNN: A Deep Learning Approach for Text Recognition

OCR accuracy improvement on document images through a novel pre-processing approach

Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents

Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach

Optical Character Recognition, Using K-Nearest Neighbors

End-to-End Optical Character Recognition for Bengali Handwritten Words

Handwritten Text Recognition Using Convolutional Neural Network

Comprehensive analysis of natural language processing

A Deep Learning-Based Pre-Trained VGG19 Model for Optical Character Recognition

An offline English optical character recognition and NER using LSTM and adaptive neuro-fuzzy inference system

SuperOCR: A Conversion from Optical Character Recognition to Image Captioning