An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT

Sondos Aabed,Ahmad Khairaldin
2024-06-22
Abstract:An end-to-end, segmentation-free, deep learning model trained from scratch is proposed, leveraging DCNN for feature extraction, alongside Bidirectional Long-Short Term Memory (BLSTM) for sequence recognition and Connectionist Temporal Classification (CTC) loss function on the KHATT database. The training phase yields remarkable results 84% recognition rate on the test dataset at the character level and 71% on the word level, establishing an image-based sequence recognition framework that operates without segmentation only at the line level. The analysis and preprocessing of the KFUPM Handwritten Arabic TexT (KHATT) database are also presented. Finally, advanced image processing techniques, including filtering, transformation, and line segmentation are implemented. The importance of this work is highlighted by its wide-ranging applications. Including digitizing, documentation, archiving, and text translation in fields such as banking. Moreover, AHR serves as a pivotal tool for making images searchable, enhancing information retrieval capabilities, and enabling effortless editing. This functionality significantly reduces the time and effort required for tasks such as Arabic data organization and manipulation.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper aims to address the problem of Arabic handwritten text recognition, particularly the challenges of recognizing handwritten Arabic in the fields of computer vision and artificial intelligence. Specifically, the goal of the paper is to develop an end-to-end, segmentation-free deep learning model for recognizing images of Arabic handwritten text. Below is an overview of the specific problems the paper attempts to solve: 1. **Complexity of Arabic Handwritten Text**: Arabic script has rich writing variations, with the same letter having different forms depending on its position in a word. Additionally, some letters can be distinguished solely by the number and position of dots. These characteristics make the recognition of Arabic handwritten text very difficult. 2. **Character Overlap**: Since Arabic letters can be connected to each other, some writing styles may lead to vertical overlap of characters, meaning two characters might appear on the same vertical line, and even characters from different words might overlap. This further increases the difficulty of recognition. 3. **Segmentation Challenges**: Traditional text recognition methods usually require segmentation of the text, breaking it down into units such as words, subwords (referred to as PAWs), or individual characters. However, for Arabic handwritten text, this segmentation process is extremely complex and prone to errors. 4. **Lack of Effective Solutions**: Historically, the Arabic language has faced challenges in digital recognition, partly due to the dominance of English in the industry, which has led to the neglect of other languages. As a result, existing solutions often fail to effectively handle the complexity of Arabic handwritten text. To address the above issues, the paper proposes an end-to-end, segmentation-free deep learning model. This model utilizes Deep Convolutional Neural Networks (DCNN) for feature extraction, combined with Bidirectional Long Short-Term Memory networks (BLSTM) and the Connectionist Temporal Classification (CTC) loss function for sequence recognition. Experimental results show that the model achieved a character-level recognition rate of 84% and a word-level recognition rate of 71% on the KHATT database, demonstrating good performance. Additionally, the paper provides a detailed analysis and preprocessing steps of the KHATT database, as well as advanced image processing techniques used to achieve model predictions.