A self-attention-based deep architecture for online handwriting recognition

Seyed Alireza Molavi,Bagher BabaAli
DOI: https://doi.org/10.1007/s00521-024-10015-6
2024-06-08
Neural Computing and Applications
Abstract:The self-attention mechanism has been the most frequent and efficient way for processing and learning sequences in numerous domains of artificial intelligence, including natural language processing, automatic speech recognition, and computer vision in recent years. It has a strong ability to learn the dependencies between the points of the input sequence, particularly those that are separated by a distance, and it also allows for parallel processing of the sequence. As a result, when used in processing sequences, this mechanism has a stronger ability to extract an appropriate representation from the input sequence at a faster rate than other approaches such as recurrent neural networks. Despite the benefits of the self-attention mechanism, recurrent neural networks along with feature engineering have been the most commonly employed approaches to online handwriting recognition. This study introduces an end-to-end online handwriting recognition system that utilizes the self-attention mechanism into three different modeling methods: CTC-based, RNN-T, and encoder–decoder. The proposed system demonstrates the capacity to recognize handwritten scripts without the need for feature engineering. The system's performance was evaluated using the Arabic Online-KHATT dataset and the English IAM-OnDB dataset. On the former, it achieved character error rate (CER) of 4.78% and word error rate (WER) of 20.63%, and on the latter, the CER of 4.10% and the WER of 14.31%, both of which were noticeably better than the results previously reported. Additionally, the Persian Online Handwriting Database was utilized for experimental validation, resulting in a CER 8.03% and a WER of 28.39%.
computer science, artificial intelligence
What problem does this paper attempt to address?