TIM-SLR: a Lightweight Network for Video Isolated Sign Language Recognition

Fei Wang,Libo Zhang,Hao Yan,Shuai Han
DOI: https://doi.org/10.1007/s00521-023-08873-7
2023-01-01
Neural Computing and Applications
Abstract:The research on video isolated sign language recognition (SLR) algorithms has made leaping progress, but there are problems that need to be solved urgently in the field of SLR. On the one hand, traditional sign language acquisition equipment has the disadvantages of being expensive and not easy to carry. Sign language collected based on Kinect contains rich information, but it is complicated to use. The data acquired by RGB cameras are beneficial to practical applications, but the existing sign language datasets collected by RGB cameras have disadvantages such as few demonstrators and small vocabulary. On the other hand, most of the existing SLR methods use complex network structures to achieve high accuracy, but complex networks mean longer inference time, which cannot meet practical application scenarios at all. In this paper, we propose a Chinese large-scale isolated sign language dataset named CSLD, which is collected using RGB camera, and each vocabulary is illustrated 10 times by 30 demonstrators, including 400 words. In addition, we proposed a lightweight TIM-SLR network. In order to verify lightweight and validity of the network, we not only conducted experiments on sign language datasets CSLD and LSA64, and obtained 91.6% and 99.8% accuracy, respectively, but also performed experiments on action recognition datasets Sth-Sth (V1 and V2) and both achieve state-of-the-art performance. Not only can it obtain higher accuracy, but also inference speed and parameter of the network can meet practical application scenarios, because TIM-SLR network is only composed of 2D convolution and temporal interaction module (TIM).
What problem does this paper attempt to address?