Enhancing ASL Recognition with GCNs and Successive Residual Connections

Ushnish Sarkar,Archisman Chakraborti,Tapas Samanta,Sarbajit Pal,Amitabha Das
2024-08-19
Abstract:This study presents a novel approach for enhancing American Sign Language (ASL) recognition using Graph Convolutional Networks (GCNs) integrated with successive residual connections. The method leverages the MediaPipe framework to extract key landmarks from each hand gesture, which are then used to construct graph representations. A robust preprocessing pipeline, including translational and scale normalization techniques, ensures consistency across the dataset. The constructed graphs are fed into a GCN-based neural architecture with residual connections to improve network stability. The architecture achieves state-of-the-art results, demonstrating superior generalization capabilities with a validation accuracy of 99.14%.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the accuracy of American Sign Language (ASL) recognition. Specifically, the authors propose a new method to enhance the effect of ASL recognition by combining Graph Convolutional Networks (GCNs) and continuous residual connections. The following are the main problems and solutions in this study: ### Research Background 1. **Limitations of Traditional Methods**: - Traditional ASL recognition methods mainly rely on image processing techniques and conventional machine - learning algorithms. These methods perform poorly when dealing with the complexity and diversity of hand postures, especially when dealing with fine finger movements and directions. - Although Convolutional Neural Networks (CNNs) can learn hierarchical features from image data, they have limitations in capturing non - Euclidean relationships between hand key points. 2. **Advantages of Graph Convolutional Networks (GCNs)**: - GCNs can naturally represent and process graph - structured data, and can more effectively model the spatial dependency relationships between hand key points, thus better capturing complex geometric relationships. ### Proposed Method To overcome the above problems, the authors propose the following methods: 1. **Key Point Extraction**: - Use the MediaPipe framework to extract 21 key points (landmarks) from each hand posture, and these key points serve as nodes of the graph. 2. **Graph Construction**: - Construct a graph from the extracted key points, where each node corresponds to a key point, and the edges represent the spatial relationships between these key points. 3. **GCN Architecture**: - The constructed graph is input into a GCN - based neural network architecture. This architecture uses residual connections to alleviate the problems of vanishing and exploding gradients and improve the stability of the network. 4. **Pre - processing Steps**: - Include translation normalization and scale normalization to ensure the consistency of the data set. 5. **Experimental Verification**: - A strict evaluation was carried out on the ASL alphabet data set, including 5 - fold cross - validation. The results show that the model achieved a validation accuracy of 99.14%, which is significantly better than existing methods. ### Formula Presentation - **Angle Calculation Formula**: \[ \text{Angle}=\arccos\left(\frac{\mathbf{u}\cdot\mathbf{v}}{|\mathbf{u}||\mathbf{v}|}\right) \] where \(\mathbf{u}\) and \(\mathbf{v}\) are vectors formed by key points. - **Scaling Factor Calculation**: \[ s = \frac{d_{\text{desired}}}{d_{\text{max}}} \] where \(d_{\text{max}}\) is the maximum distance in the current landmark set, and \(d_{\text{desired}}\) is the desired maximum distance. - **GCN Update Step**: \[ \mathbf{h}_{\text{out}}=D^{-\frac{1}{2}}(A + I)D^{-\frac{1}{2}}Z_{\text{out}} \] where \(D\) is the degree matrix, \(A\) is the adjacency matrix, \(I\) is the identity matrix, and \(Z_{\text{out}}\) is the output of the aggregation step. ### Conclusion By introducing GCNs and continuous residual connections, this study has successfully improved the performance of ASL recognition, achieving a validation accuracy of 99.14%, providing a new benchmark and framework for future research.