Isolated Arabic Sign Language Recognition Using A Transformer-based Model and Landmark Keypoints

Sarah Alyami,Hamzah Luqman,Mohammad Hammoudeh
DOI: https://doi.org/10.1145/3584984
IF: 1.471
2023-02-21
ACM Transactions on Asian and Low-Resource Language Information Processing
Abstract:Pose-based approaches for sign language recognition provide light-weight and fast models that can be adopted in real-time applications. This paper presented a framework for isolated Arabic sign language (ArSL) recognition using hand and face keypoints. We employed MediaPipe pose estimator for extracting the keypoints of sign gestures in the video stream. Using the extracted keypoints, three models were proposed for sign language recognition, Long-Term Short Memory (LSTM), Temporal Convolution Networks (TCN) and Transformer based models. Moreover, we investigated the importance of non-manual features for sign language recognition systems and the obtained results showed that combining hand and face keypoints boosted the recognition accuracy by around \(4\% \) compared with only hand keypoints. The proposed models were evaluated on Arabic and Argentinian sign languages. Using the KArSL-100 dataset, the proposed pose-based Transformer achieved the highest accuracy of \(99.74\% \) and \(68.2\% \) in signer-dependent and independent modes, respectively. Additionally, the Transformer was evaluated on the LSA64 dataset and obtained an accuracy of \(98.25\% \) and \(91.09\% \) in signer-dependent and independent modes, respectively. Consequently, the pose-based Transformer outperformed the state-of-the-art techniques on both datasets using keypoints from the signer’s hands and face.
computer science, artificial intelligence
What problem does this paper attempt to address?