Abstract:The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different font styles of handwritten texts with degradation of historical text images make it a challenging problem. Recognizing scanned document images in neural network-based systems typically involves a two-step approach: segmentation and recognition. However, this method has several drawbacks. These shortcomings encompass challenges in identifying text regions, analyzing layout diversity within pages, and establishing accurate ground truth segmentation. Consequently, these processes are prone to errors, leading to bottlenecks in achieving high recognition accuracies. Thus, in this study, we present an end-to-end paragraph recognition system that incorporates internal line segmentation and gated convolutional layers based encoder. The gating is a mechanism that controls the flow of information and allows to adaptively selection of the more relevant features in handwritten text recognition models. The attention module plays an important role in performing internal line segmentation, allowing the page to be processed line-by-line. During the decoding step, we have integrated a connectionist temporal classification-based word beam search decoder as a post-processing step. In this work, we have extended existing LexiconNet by carefully applying and utilizing gated convolutional layers in the existing deep neural network. Our results at line and page levels also favour our new GatedLexiconNet. This study reported character error rates of 2.27% on IAM, 0.9% on RIMES, and 2.13% on READ-16, and word error rates of 5.73% on IAM, 2.76% on RIMES, and 6.52% on READ-2016 datasets.

Fast Recurrent Neural Network with Bi-LSTM for Handwritten Tamil text segmentation in NLP

HANDWRITTEN CHARACTER RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS

An Automatic Tamil Speech Recognition system by using Bidirectional Recurrent Neural Network with Self-Organizing Map

Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation

Refocus attention span networks for handwriting line recognition

Pay Attention to What You Read: Non-recurrent Handwritten Text-Line Recognition

Handwritten Tamil Character Recognition UsingDeep Learning

Optimally configured convolutional neural network for Tamil Handwritten Character Recognition by improved lion optimization model

A hypothesize-and-verify framework for Text Recognition using Deep Recurrent Neural Networks

Tamil OCR Conversion from Digital Writing Pad Recognition Accuracy Improves through Modified Deep Learning Architectures

GatedLexiconNet: A Comprehensive End-to-End Handwritten Paragraph Text Recognition System

A recurrent neural network based deep learning model for text and non-text stroke classification in online handwritten Devanagari document

An Efficient End-to-End Neural Model for Handwritten Text Recognition

AttaCut: A Fast and Accurate Neural Thai Word Segmenter

ANN-based Innovative Segmentation Method for Handwritten text in Assamese

A Comprehensive Handwritten Paragraph Text Recognition System: LexiconNet

Text Line Segmentation from Struck-out Handwritten Document Images

A novel nearest interest point classifier for offline Tamil handwritten character recognition

ITERATED DILATED CONVOLUTIONAL NEURAL NETWORKS FOR WORD SEGMENTATION

Deep Learning Model for Tamil Part-of-Speech Tagging

Cross Lingual Handwritten Character Recognition Using Long Short Term Memory Network with aid of Elephant Herding Optimization Algorithm