Abstract:Everything becomes smart in the modern era, for everything we need a better plan or arrangements. In the olden days, essential information was noted as a document with the help of paper and pen or printed texts. But the intelligent world needs a paperless environment by converting handwritten or printed text documents into software copies. This can be achieved by the electronic data conversion concept called Optical Character Recognition (OCR). OCR of some documents is complex because of different writing styles and quality of scanned image issues, which can be solved by adopting a deep learning technique for better accuracy. We employed Long Short Term Memory (LSTM) for English Optical Character Recognition for paperless and effortless data storage and fast access in this work. Still, the records may contain the entities like names, contact details, drug details, diseases, educational qualifications, dates, etc. These entities cannot be separated by employing OCR alone; we need an entity recognition framework for deeper and faster data analysis. For efficient Named Entity Recognition, we utilize the Adaptive Fuzzy Inference System (ANFIS) powered by the algorithms CRF and BERT to automatically labels each entity by training the vast amount of unlabeled text data. The ANFIS model is equipped with both linguistic and numerical knowledge. It is more accurate than the ANN when it comes to identifying patterns and classification data. Also, it is more transparent to the user. Our proposed framework aims to improve the performance of the character recognition system by using a feed-forward network. One of the main issues that have been identified in the development of this system is noise. Through this network, we can provide a single input and one output layer. The main components of the system are the training and recognition sections. These two sections are mainly focused on image acquisition and feature extraction. Besides these, they also include training and simulation of the classifier. The first step in the process of image recognition is to extract the features from the normalized image matrix. We then train the network using a proposed training algorithm. Experimentation on medical records attains a higher accuracy value of 0.9637, recall value of 0.9627, and f1 score of 0.9627, respectively.

ORTHOGRAPHIC CASE RESTORATION USING SUPERVISED LEARNING WITHOUT MANUAL ANNOTATION

Incorporating External POS Tagger for Punctuation Restoration

Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining

Struck-Out Handwritten Word Detection and Restoration for Automatic Descriptive Answer Evaluation

A Supervised Machine Learning Approach for Post-OCR Error Detection for Historical Text

Statistical Learning for OCR Text Correction

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Towards Unsupervised Speech Recognition Without Pronunciation Models

Ultrasonic Image's Annotation Removal: A Self-supervised Noise2Noise Approach

Learning to Read by Spelling: Towards Unsupervised Text Recognition

UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction

Post Text Processing of Chinese Speech Recognition Based on Bidirectional LSTM Networks and CRF

An Adaptive Post-processing Method using Proofreading Information for Chinese Character Recognition

An Efficient Architecture for Predicting the Case of Characters using Sequence Models

Accelerating Clinical Text Annotation in Underrepresented Languages: A Case Study on Text De-Identification

Automatic Speech Recognition Post-Processing for Readability: Task, Dataset and a Two-Stage Pre-Trained Approach

Unsupervised Structure-Texture Separation Network for Oracle Character Recognition

OCR Post Correction for Endangered Language Texts

An offline English optical character recognition and NER using LSTM and adaptive neuro-fuzzy inference system

CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction

Neural OCR Post-Hoc Correction of Historical Corpora