3D gesture segmentation for word-level Arabic sign language using large-scale RGB video sequences and autoencoder convolutional networks

Abdelbasset Boukdir,Mohamed Benaddy,Ayoub Ellahyani,Othmane El Meslouhi,Mustapha Kardouchi
DOI: https://doi.org/10.1007/s11760-022-02167-6
2022-02-23
Abstract:Sign languages use hands, body movements, and facial expressions to deliver a message. Developing a communication environment for the deaf community is a social and economical necessity. Research has been conducted on the segmentation of gestures to develop methods capable of identifying a given sequence of signs and understanding their meaning. However, the variety of hand shapes and the complexity of gestures remain a challenge. In this paper, we propose a novel model called 3D gesture segmentation network (3D GS-Net) from video sequences for word-level Arabic sign language (ArSL) with a small number of features. To efficiently process and analyze the frame sequences, annotation and normalization are applied to the dataset. During the training phase, the preprocessed data are fed into the 3D GS-Net model using an autoencoder convolutional network architecture designed as a two-branch network that is merged at the final layer to produce the final predictive segmentation output. The proposed 3D GS-Net has been experimented with RGB videos of the Moroccan sign language (MoSL) dataset. Our obtained results have been compared with existing approaches and demonstrate the effectiveness and efficiency of our 3D GS-Net approach in the segmentation of gestures through different evaluation metrics .
What problem does this paper attempt to address?