Abstract:This study presents a novel approach for enhancing American Sign Language (ASL) recognition using Graph Convolutional Networks (GCNs) integrated with successive residual connections. The method leverages the MediaPipe framework to extract key landmarks from each hand gesture, which are then used to construct graph representations. A robust preprocessing pipeline, including translational and scale normalization techniques, ensures consistency across the dataset. The constructed graphs are fed into a GCN-based neural architecture with residual connections to improve network stability. The architecture achieves state-of-the-art results, demonstrating superior generalization capabilities with a validation accuracy of 99.14%.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the accuracy of American Sign Language (ASL) recognition. Specifically, the authors propose a new method to enhance the effect of ASL recognition by combining Graph Convolutional Networks (GCNs) and continuous residual connections. The following are the main problems and solutions in this study: ### Research Background 1. **Limitations of Traditional Methods**: - Traditional ASL recognition methods mainly rely on image processing techniques and conventional machine - learning algorithms. These methods perform poorly when dealing with the complexity and diversity of hand postures, especially when dealing with fine finger movements and directions. - Although Convolutional Neural Networks (CNNs) can learn hierarchical features from image data, they have limitations in capturing non - Euclidean relationships between hand key points. 2. **Advantages of Graph Convolutional Networks (GCNs)**: - GCNs can naturally represent and process graph - structured data, and can more effectively model the spatial dependency relationships between hand key points, thus better capturing complex geometric relationships. ### Proposed Method To overcome the above problems, the authors propose the following methods: 1. **Key Point Extraction**: - Use the MediaPipe framework to extract 21 key points (landmarks) from each hand posture, and these key points serve as nodes of the graph. 2. **Graph Construction**: - Construct a graph from the extracted key points, where each node corresponds to a key point, and the edges represent the spatial relationships between these key points. 3. **GCN Architecture**: - The constructed graph is input into a GCN - based neural network architecture. This architecture uses residual connections to alleviate the problems of vanishing and exploding gradients and improve the stability of the network. 4. **Pre - processing Steps**: - Include translation normalization and scale normalization to ensure the consistency of the data set. 5. **Experimental Verification**: - A strict evaluation was carried out on the ASL alphabet data set, including 5 - fold cross - validation. The results show that the model achieved a validation accuracy of 99.14%, which is significantly better than existing methods. ### Formula Presentation - **Angle Calculation Formula**: \[ \text{Angle}=\arccos\left(\frac{\mathbf{u}\cdot\mathbf{v}}{|\mathbf{u}||\mathbf{v}|}\right) \] where \(\mathbf{u}\) and \(\mathbf{v}\) are vectors formed by key points. - **Scaling Factor Calculation**: \[ s = \frac{d_{\text{desired}}}{d_{\text{max}}} \] where \(d_{\text{max}}\) is the maximum distance in the current landmark set, and \(d_{\text{desired}}\) is the desired maximum distance. - **GCN Update Step**: \[ \mathbf{h}_{\text{out}}=D^{-\frac{1}{2}}(A + I)D^{-\frac{1}{2}}Z_{\text{out}} \] where \(D\) is the degree matrix, \(A\) is the adjacency matrix, \(I\) is the identity matrix, and \(Z_{\text{out}}\) is the output of the aggregation step. ### Conclusion By introducing GCNs and continuous residual connections, this study has successfully improved the performance of ASL recognition, achieving a validation accuracy of 99.14%, providing a new benchmark and framework for future research.

Enhancing ASL Recognition with GCNs and Successive Residual Connections

Asymmetric multi-branch GCN for skeleton-based sign language recognition

Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN)

Sign Language Recognition Using Graph and General Deep Neural Network Based on Large Scale Dataset

Interactive attention and improved GCN for continuous sign language recognition

Mediapipe and CNNs for Real-Time ASL Gesture Recognition

ASL-3DCNN: American sign language recognition technique using 3-D convolutional neural networks

Using CSNNs to Perform Event-based Data Processing & Classification on ASL-DVS

Hypertuned Deep Convolutional Neural Network for Sign Language Recognition

Spatial–temporal attention with graph and general neural network-based sign language recognition

Enhancing Arabic Sign Language Interpretation: Leveraging Convolutional Neural Networks and Transfer Learning

Adaptive Semantic-Spatio-Temporal Graph Convolutional Network for Lip Reading

Recognizing American Sign Language Manual Signs from Rgb-D Videos

Automatic American sign language prediction for static and dynamic gestures using KFM-CNN

A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion

Active convolutional neural networks sign language (ActiveCNN-SL) framework: a paradigm shift in deaf-mute communication

STCN-GR: Spatial-Temporal Convolutional Networks for Surface-Electromyography-Based Gesture Recognition

Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network

A Deep Learning Approach for ASL Recognition and Text-to-Speech Synthesis using CNN

An Efficient Graph Convolution Network for Skeleton-Based Dynamic Hand Gesture Recognition

Dynamical semantic enhancement network for continuous sign language recognition