ST-Xception: A Depthwise Separable Convolution Network for Military Sign Language Recognition

Yuhao Zhang,Jun Liao,Mengyuan Ran,Xin Li,Shanshan Wang,Li Liu
DOI: https://doi.org/10.1109/SMC42975.2020.9283407
2020-01-01
Abstract:Military sign language is an important form of tactical communication, especially in restrict situations where either distance or a requirement for silence precludes oral means. Unfortunately, when soldiers cannot see each other, the communication mode of tactical gestures is no longer effective, which may hinder military operations. Vision-based approaches have been at the forefront in the field of hand gesture recognition. However, there still lacks of specific datasets and models for the task of military sign language recognition. In this paper, we collected a new first-person dataset named MSL, which contains 16 classes of 3, 840 tactical gesture samples on battle scenario with more than 11, 0000 video frames performed by 10 subjects. Moreover, we present a novel deep network, called ST-Xception architecture, in light of the depthwise separable convolutions to recognize such military sign language. By expanding the convolution filters and pooling kernels into 3D, our network can characterize the inherent spatio-temporal relationship of a certain tactical hand gesture. In particular, we further reduce computational cost and relieve overfitting by replacing the fully connected layers with adaptive average pooling. Experimental results show that our model outperforms existing models both on our in-house MSL dataset and two other benchmark datasets.
What problem does this paper attempt to address?