Abstract:Sign language is one of the most effective communication tools for people with hearing difficulties. Most existing works focus on improving the performance of sign language tasks on RGB videos, which may suffer from degraded recording conditions, such as fast movement of hands with motion blur and textured signer's appearance. The bio-inspired event camera, which asynchronously captures brightness change with high speed, could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks. In this work, we aim at exploring the potential of event camera in continuous sign language recognition (CSLR) and sign language translation (SLT). To promote the research, we first collect an event-based benchmark EvSign for those tasks with both gloss and spoken language annotations. EvSign dataset offers a substantial amount of high-quality event streams and an extensive vocabulary of glosses and words, thereby facilitating the development of sign language tasks. In addition, we propose an efficient transformer-based framework for event-based SLR and SLT tasks, which fully leverages the advantages of streaming events. The sparse backbone is employed to extract visual features from sparse events. Then, the temporal coherence is effectively utilized through the proposed local token fusion and gloss-aware temporal aggregation modules. Extensive experimental results are reported on both simulated (PHOENIX14T) and EvSign datasets. Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost (0.84G FLOPS per video) and 44.2% network parameters. The project is available at <a class="link-external link-https" href="https://zhang-pengyu.github.io/EVSign" rel="external noopener nofollow">this https URL</a>.

Enabling Real-time Sign Language Translation on Mobile Platforms with On-board Depth Cameras

Neural Sign Language Translation Based on Human Keypoint Estimation

Enhancing Bidirectional Sign Language Communication: Integrating YOLOv8 and NLP for Real-Time Gesture Recognition & Translation

American Sign Language Translation Using Wearable Inertial and Electromyography Sensors for Tracking Hand Movements and Facial Expressions

SignSpeaker: A Real-time, High-Precision SmartWatch-based Sign Language Translator

EvSign: Sign Language Recognition and Translation with Streaming Events

Hear Sign Language: A Real-Time End-to-End Sign Language Recognition System

Android-Based Application for Real-Time Indonesian Sign Language Recognition Using Convolutional Neural Network

Everybody Sign Now: Translating Spoken Language to Photo Realistic Sign Language Video

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale

Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation

DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation

Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

Towards Real-Time Sign Language Recognition and Translation on Edge Devices

Signs as Tokens: An Autoregressive Multilingual Sign Language Generator

Two-Stream Network for Sign Language Recognition and Translation

Robust Sign Language Recognition System Using ToF Depth Cameras

A two-way translation system of Chinese sign language based on computer vision

Keypoint based Sign Language Translation without Glosses

A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

HearASL: Your Smartphone Can Hear American Sign Language