Abstract:Research in the field of sign language recognition has made significant advances in recent years. The present achievements provide the basis for future applications with the objective of supporting the integration of deaf people into the hearing society. Translation systems, for example, could facilitate communication between deaf and hearing people in public situations. Further applications, such as user interfaces and automatic indexing of signed videos, become feasible. The current state in sign language recognition is roughly 30 years behind speech recognition, which corresponds to the gradual transition from isolated to continuous recognition for small vocabulary tasks. Research efforts were mainly focused on robust feature extraction or statistical modeling of signs. However, current recognition systems are still designed for signer-dependent operation under laboratory conditions. This paper describes a comprehensive concept for robust visual sign language recognition, which represents the recent developments in this field. The proposed recognition system aims for signer-independent operation and utilizes a single video camera for data acquisition to ensure user-friendliness. Since sign languages make use of manual and facial means of expression, both channels are employed for recognition. For mobile operation in uncontrolled environments, sophisticated algorithms were developed that robustly extract manual and facial features. The extraction of manual features relies on a multiple hypotheses tracking approach to resolve ambiguities of hand positions. For facial feature extraction, an active appearance model is applied which allows identification of areas of interest such as the eyes and mouth region. In the next processing step, a numerical description of the facial expression, head pose, line of sight, and lip outline is computed. The system employs a resolution strategy for dealing with mutual overlapping of the signer’s hands and face. Classification is based on hidden Markov models which are able to compensate time and amplitude variances in the articulation of a sign. The classification stage is designed for recognition of isolated signs, as well as of continuous sign language. In the latter case, a stochastic language model can be utilized, which considers uni- and bigram probabilities of single and successive signs. For statistical modeling of reference models each sign is represented either as a whole or as a composition of smaller subunits—similar to phonemes in spoken languages. While recognition based on word models is limited to rather small vocabularies, subunit models open the door to large vocabularies. Achieving signer-independence constitutes a challenging problem, as the articulation of a sign is subject to high interpersonal variance. This problem cannot be solved by simple feature normalization and must be addressed at the classification level. Therefore, dedicated adaptation methods known from speech recognition were implemented and modified to consider the specifics of sign languages. For rapid adaptation to unknown signers the proposed recognition system employs a combined approach of maximum likelihood linear regression and maximum a posteriori estimation.

A shapelet-based framework for large-scale word-level sign language database auto-construction

Automatic dense annotation of large-vocabulary sign language videos

Sign Language Video Retrieval with Free-Form Textual Queries

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

Real-Time Vision-Based Chinese Sign Language Recognition with Pose Estimation and Attention Network

Boosted Subunits: a Framework for Recognising Sign Language from Videos

A Real-Time Large Vocabulary Recognition System for Chinese Sign Language.

YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus

Sign Language Recognition Using Graph and General Deep Neural Network Based on Large Scale Dataset

A two-way translation system of Chinese sign language based on computer vision

Natural Language-Assisted Sign Language Recognition

MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition

SignVTCL: Multi-Modal Continuous Sign Language Recognition Enhanced by Visual-Textual Contrastive Learning

A new system for Chinese sign language recognition

Improving Continuous Sign Language Recognition with Adapted Image Models

Recent developments in visual sign language recognition

ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

Attention-Based 3D-Cnns for Large-Vocabulary Sign Language Recognition.

Video-Based Sign Language Recognition Without Temporal Segmentation

How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language

Sign Language Recognition with Long Short-Term Memory.