Abstract:Sign language is commonly used by deaf or mute people to communicate but requires extensive effort to master. It is usually performed with the fast yet delicate movement of hand gestures, body posture, and even facial expressions. Current Sign Language Recognition (SLR) methods usually extract features via deep neural networks and suffer overfitting due to limited and noisy data. Recently, skeleton-based action recognition has attracted increasing attention due to its subject-invariant and background-invariant nature, whereas skeleton-based SLR is still under exploration due to the lack of hand annotations. Some researchers have tried to use off-line hand pose trackers to obtain hand keypoints and aid in recognizing sign language via recurrent neural networks. Nevertheless, none of them outperforms RGB-based approaches yet. To this end, we propose a novel Skeleton Aware Multi-modal Framework with a Global Ensemble Model (GEM) for isolated SLR (SAM-SLR-v2) to learn and fuse multi-modal feature representations towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics of skeleton keypoints and a Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. The skeleton-based predictions are fused with other RGB and depth based modalities by the proposed late-fusion GEM to provide global information and make a faithful SLR prediction. Experiments on three isolated SLR datasets demonstrate that our proposed SAM-SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins. Our code will be available at https://github.com/jackyjsy/SAM-SLR-v2

A Sign Language Recognition Framework Based on Cross-Modal Complementary Information Fusion

Boosting Continuous Sign Language Recognition via Cross Modality Augmentation

Manual and non-manual sign language recognition framework using hybrid deep learning techniques

Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition

Combinational sign language recognition

SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning

Sign Language Recognition with Multi-modal Features.

Sign Language Recognition with Long Short-Term Memory.

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

TMS-Net: A multi-feature multi-stream multi-level information sharing network for skeleton-based sign language recognition

A Vision-Based Sign Language Recognition System Using Tied-Mixture Density HMM

Enhancing Signer-Independent Recognition of Isolated Sign Language through Advanced Deep Learning Techniques and Feature Fusion

Natural Language-Assisted Sign Language Recognition

Hear Sign Language: A Real-Time End-to-End Sign Language Recognition System

Difference-guided multi-scale spatial-temporal representation for sign language recognition

Collaborative Multilingual Continuous Sign Language Recognition: A Unified Framework

MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition

Sign Language Recognition Based on Adaptive Hmms with Data Augmentation

Chinese sign language recognition with adaptive HMM

A Chinese Sign Language Recognition System Based on SOFM/SRN/HMM

Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble