Abstract:Sign language recognition technology can help people with hearing impairments to communicate with non-hearing-impaired people. At present, with the rapid development of society, deep learning also provides certain technical support for sign language recognition work. In sign language recognition tasks, traditional convolutional neural networks used to extract spatio-temporal features from sign language videos suffer from insufficient feature extraction, resulting in low recognition rates. Nevertheless, a large number of video-based sign language datasets require a significant amount of computing resources for training while ensuring the generalization of the network, which poses a challenge for recognition. In this paper, we present a video-based sign language recognition method based on Residual Network (ResNet) and Long Short-Term Memory (LSTM). As the number of network layers increases, the ResNet network can effectively solve the granularity explosion problem and obtain better time series features. We use the ResNet convolutional network as the backbone model. LSTM utilizes the concept of gates to control unit states and update the output feature values of sequences. ResNet extracts the sign language features. Then, the learned feature space is used as the input of the LSTM network to obtain long sequence features. It can effectively extract the spatio-temporal features in sign language videos and improve the recognition rate of sign language actions. An extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed method, with an accuracy of 85.26%, F1-score of 84.98%, and precision of 87.77% on Argentine Sign Language (LSA64).

Automatic Segmentation of Sign Language into Subtitle-Units

Sign Language Video Retrieval with Free-Form Textual Queries

Aligning Subtitles in Sign Language Videos

Automatic dense annotation of large-vocabulary sign language videos

Subunit Boundary Detection for Sign Language Recognition Using Spatio-temporal Modelling

Video-Based Sign Language Recognition Without Temporal Segmentation

Linguistically Motivated Sign Language Segmentation

Modelling and Segmenting Subunits for Sign Language Recognition Based on Hand Motion Analysis

Boosted Subunits: a Framework for Recognising Sign Language from Videos

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

Hierarchical LSTM for Sign Language Translation.

SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation

Natural Language-Assisted Sign Language Recognition

Sign Languague Recognition without frame-sequencing constraints: A proof of concept on the Argentinian Sign Language

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

Automatic Skin Segmentation and Tracking in Sign Language Recognition

Video-Based Sign Language Recognition via ResNet and LSTM Network

Sign Language Production with Latent Motion Transformer

SignGraph: A Sign Sequence is Worth Graphs of Nodes

Conditional Sentence Generation and Cross-modal Reranking for Sign Language Translation

Watch, read and lookup: learning to spot signs from multiple supervisors