Abstract:With the increase in the number of hearing-impaired people in the world, sign language recognition (SLR) has attracted extensive attention from scholars. Given the problems existing in the current research of SLR, such as parameter expansion and unsatisfactory performance of feature extraction, a novel skeleton-based method is proposed in this paper. The Asymmetric Multi-branch Graph Convolution Network (AM-GCN), composed of a spatial graph convolution and an Asymmetric Multi-branch Temporal Convolution (MTC), is constructed to achieve the acquisition and processing of graph structure. MTC utilizes multi-branch dilated convolution to expand the receptive field and enhance information dependence. To effectively extract discriminative spatiotemporal information from a large amount of information, the Spatial and Temporal Fusion Attention module (STFA) is proposed. The STFA maintains spatiotemporal consistency and obtains the fused attention map, which substantially facilitates spatiotemporal feature learning. In this article, Asymmetric Convolution Channel Attention (ACCA) is used as channel attention. Some experiments are carried out on a processed dataset obtained from video transformation, confirming the robustness of the ACCA for image flipping and rotation. The STFA and ACCA jointly form a spatial-temporal-channel attention module to extract distinguishing features and enhance the model representation. Eventually, the attention module is inserted into the AM-GCN, attaining AM-GCN-A, which is experimented on the WLASL2000, AUTSL, and CSL datasets. The top 1 accuracy is 57.01%$$\%$$, 96.27%$$\%$$, and 98.20%$$\%$$, respectively. The results are competitive with the state-of-the-art methods and prove the effectiveness of the model.

Global-local Enhancement Network for NMFs-aware Sign Language Recognition

Global-Local Enhancement Network for NMF-Aware Sign Language Recognition

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

Natural Language-Assisted Sign Language Recognition

Attention-Based 3D-Cnns for Large-Vocabulary Sign Language Recognition.

TMS-Net: A multi-feature multi-stream multi-level information sharing network for skeleton-based sign language recognition

SF-Net: Structured Feature Network for Continuous Sign Language Recognition

Multi-View Spatial-Temporal Network for Continuous Sign Language Recognition

Difference-guided multi-scale spatial-temporal representation for sign language recognition

Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble

Two-Stream Network for Sign Language Recognition and Translation

Hand-Model-Aware Sign Language Recognition

Sign Language Recognition Based on Adaptive Hmms with Data Augmentation

Sign Language Translation with Hierarchical Spatio-TemporalGraph Neural Network

Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation

Sign language recognition using real-sense

Sign Language Recognition with Long Short-Term Memory.

Video-Based Sign Language Recognition Without Temporal Segmentation

Asymmetric multi-branch GCN for skeleton-based sign language recognition

Sign Language Translation with Hierarchical Spatio-Temporal Graph Neural Network