Recognizing American Sign Language Manual Signs from Rgb-D Videos

Elahe Vahdani,Longlong Jing,Matt Huenerfauth,YingLi Tian
DOI: https://doi.org/10.2139/ssrn.4019317
2022-01-01
SSRN Electronic Journal
Abstract:In this paper, we propose a 3D Convolutional Neural Network (3DCNN) based multi-stream framework to recognize American Sign Language (ASL) manual signs and non-manual gestures (face and head movements) in real-time from RGB-D videos by fusing multimodal features including hand gestures, facial expressions, and body poses from multiple channels (RGB, depth, motion, and skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3DCNN model. We collected a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos of 100 ASL manual signs captured by a Microsoft Kinect V2 camera. Our proposed method achieves 92.88 % accuracy for recognizing 100 ASL sign glosses in the ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on the Chalearn IsoGD dataset.
English Else
What problem does this paper attempt to address?