Advanced Arabic Alphabet Sign Language Recognition Using Transfer Learning and Transformer Models

Mazen Balat,Rewaa Awaad,Hend Adel,Ahmed B. Zaky,Salah A. Aly
2024-10-01
Abstract:This paper presents an Arabic Alphabet Sign Language recognition approach, using deep learning methods in conjunction with transfer learning and transformer-based models. We study the performance of the different variants on two publicly available datasets, namely ArSL2018 and AASL. This task will make full use of state-of-the-art CNN architectures like ResNet50, MobileNetV2, and EfficientNetB7, and the latest transformer models such as Google ViT and Microsoft Swin Transformer. These pre-trained models have been fine-tuned on the above datasets in an attempt to capture some unique features of Arabic sign language motions. Experimental results present evidence that the suggested methodology can receive a high recognition accuracy, by up to 99.6\% and 99.43\% on ArSL2018 and AASL, respectively. That is far beyond the previously reported state-of-the-art approaches. This performance opens up even more avenues for communication that may be more accessible to Arabic-speaking deaf and hard-of-hearing, and thus encourages an inclusive society.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy and efficiency of Arabic Alphabet Sign Language (ArASL) recognition. Specifically, the author aims to develop an efficient and accurate ArASL recognition system by combining deep - learning methods, transfer learning, and Transformer - based models. This system can not only significantly improve the recognition accuracy but also provide a more convenient communication tool for deaf and hearing - impaired people in the Arabic - speaking world in practical applications, promoting social inclusion. ### Main problems 1. **Technical challenges**: Traditional machine - learning methods have difficulty capturing the subtle differences and diversity in gestures when dealing with Arabic Alphabet Sign Language. Therefore, more advanced deep - learning models are required to improve the recognition performance. 2. **Dataset limitations**: The existing ArASL datasets are small in scale and incompletely labeled, which limits the training effect of the model. For this reason, the author used two publicly available datasets (ArSL2018 and AASL) and carried out sufficient pre - processing. 3. **Application scenarios**: How to integrate advanced sign - language recognition technology into mainstream devices to improve the education, work, and social participation of deaf and hearing - impaired people. ### Solutions - **Transfer learning**: Utilize pre - trained CNN models (such as ResNet50, MobileNetV2, EfficientNetB7) and Transformer models (such as Google ViT, Microsoft Swin Transformer), and fine - tune these models to adapt to the ArASL task. - **Multi - model comparison**: Study the performance of different models on the two datasets to find the model architecture that is most suitable for ArASL recognition. - **Data pre - processing**: Ensure the quality and consistency of the input data by performing operations such as class - balance processing, grayscale conversion, image scaling, and pixel - value normalization on the dataset. ### Experimental results The experimental results show that the proposed method based on transfer learning and Transformer achieved test accuracies of 99.6% and 99.43% on the ArSL2018 and AASL datasets respectively, far exceeding the best results of existing methods. This indicates that the new method not only improves the recognition accuracy but also provides new directions and technical means for future research. ### Social significance Through this research, the author hopes to promote the development of more intelligent and efficient assistive technologies in Arabic - speaking countries and regions, helping deaf and hearing - impaired people better integrate into society and enjoy equal access to information and services.