A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

Riaz Ahmad,Saeeda Naz,Muhammad Afzal,Sheikh Rashid,Marcus Liwicki,Andreas Dengel
DOI: https://doi.org/10.34028/iajit/17/3/3
2020-05-01
The International Arab Journal of Information Technology
Abstract:This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.
What problem does this paper attempt to address?