Abstract:Sign language recognition (SLR) is one of the crucial applications of the hand gesture recognition and computer vision research domain. There are many researchers who have been working to develop a hand gesture-based SLR application for English, Turkey, Arabic, and other sign languages. However, few studies have been conducted on Korean sign language classification because few KSL datasets are publicly available. In addition, the existing Korean sign language recognition work still faces challenges in being conducted efficiently because light illumination and background complexity are the major problems in this field. In the last decade, researchers successfully applied a vision-based transformer for recognizing sign language by extracting long-range dependency within the image. Moreover, there is a significant gap between the CNN and transformer in terms of the performance and efficiency of the model. In addition, we have not found a combination of CNN and transformer-based Korean sign language recognition models yet. To overcome the challenges, we proposed a convolution and transformer-based multi-branch network aiming to take advantage of the long-range dependencies computation of the transformer and local feature calculation of the CNN for sign language recognition. We extracted initial features with the grained model and then parallelly extracted features from the transformer and CNN. After concatenating the local and long-range dependencies features, a new classification module was applied for the classification. We evaluated the proposed model with a KSL benchmark dataset and our lab dataset, where our model achieved 89.00% accuracy for 77 label KSL dataset and 98.30% accuracy for the lab dataset. The higher performance proves that the proposed model can achieve a generalized property with considerably less computational cost.

ZS-SLR: Zero-Shot Sign Language Recognition from RGB-D Videos

Multi-Modal Zero-Shot Sign Language Recognition

Full transformer network with masking future for word-level sign language recognition

Natural Language-Assisted Sign Language Recognition

Enhancing Signer-Independent Recognition of Isolated Sign Language through Advanced Deep Learning Techniques and Feature Fusion

Multi-modal zero-shot dynamic hand gesture recognition

Two-Stream Network for Sign Language Recognition and Translation

Skeleton Aware Multi-modal Sign Language Recognition

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

Isolated Arabic Sign Language Recognition Using A Transformer-based Model and Landmark Keypoints

Spatial–temporal transformer for end-to-end sign language recognition

Hear Sign Language: A Real-Time End-to-End Sign Language Recognition System

Robust Sign Language Recognition System Using ToF Depth Cameras

Sign Language Recognition with Multi-modal Features.

A Transformer Model for Boundary Detection in Continuous Sign Language

Sign language recognition using real-sense

Korean Sign Language Recognition Using Transformer-Based Deep Neural Network

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Sign Language Recognition with Long Short-Term Memory.

Temporal–Semantic Aligning and Reasoning Transformer for Audio-Visual Zero-Shot Learning