Real Time American Sign Language Detection Using Yolo-v9

Amna Imran,Meghana Shashishekhara Hulikal,Hamza A. A. Gardi
2024-07-25
Abstract:This paper focuses on real-time American Sign Language Detection. YOLO is a convolutional neural network (CNN) based model, which was first released in 2015. In recent years, it gained popularity for its real-time detection capabilities. Our study specifically targets YOLO-v9 model, released in 2024. As the model is newly introduced, not much work has been done on it, especially not in Sign Language Detection. Our paper provides deep insight on how YOLO- v9 works and better than previous model.
Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper mainly addresses the following issues: 1. **Background and Need**: The paper emphasizes the importance of Sign Language Recognition (SLR), especially for individuals with hearing impairments, for whom sign language is a crucial tool for communication with the world. According to data from the World Health Organization, a large portion of the global population is affected by hearing loss, and this number is expected to continue growing. Therefore, increasing the prevalence of sign language and providing technical support becomes particularly important. 2. **Technical Challenges**: The paper points out the information bottleneck problem in deep neural networks, where some original information is lost as data is transmitted between network layers. This can lead to the model not fully utilizing the input information during prediction, thereby affecting the final performance. 3. **Methodological Innovations**: - **YOLO-v9 Model**: The paper focuses on the latest version in the YOLO series, the YOLO-v9 model, which is a real-time object detection algorithm based on Convolutional Neural Networks (CNN). Compared to previous versions, YOLO-v9 has been optimized for handling sign language recognition tasks. - **Information Bottleneck Principle and Invertible Functions**: To alleviate the information bottleneck problem, the paper introduces the concept of the information bottleneck principle and invertible functions, aiming to reduce information loss during training and ensure the accuracy of gradient updates. - **Programmable Gradient Information (PGI)**: The concept of PGI is proposed to generate reliable gradient information, ensuring that deep features retain key characteristics through auxiliary invertible branches, thereby addressing the issue of information loss during the feedforward process. - **Generalized Efficient Layer Aggregation Network (GELAN)**: The GELAN architecture is combined to further optimize the performance of the YOLO-v9 model, particularly in object detection tasks. 4. **Experimental Results**: The paper demonstrates the effectiveness of the YOLO-v9 model in sign language recognition tasks, including comparisons of two variants (YOLO-v9c and YOLO-v9e). The results show that the YOLO-v9 model performs well in terms of accuracy, recall, and mean average precision, especially excelling in real-time sign language recognition applications. In summary, the paper aims to address the technical challenges in the field of sign language recognition by leveraging the latest YOLO-v9 model and related technological innovations, thereby improving the communication efficiency and quality for the deaf and hard-of-hearing community.