Temporal superimposed crossover module for effective continuous sign language

Qidan Zhu,Jing Li,Fei Yuan,Quan Gan
DOI: https://doi.org/10.1007/s00138-024-01595-3
IF: 2.983
2024-08-21
Machine Vision and Applications
Abstract:The ultimate goal of continuous sign language recognition is to facilitate communication between special populations and normal people, which places high demands on the real-time and deployable nature of the model. However, researchers have paid little attention to these two properties in previous studies on CSLR. In this paper, we propose a novel CSLR model ResNetT based on temporal superposition crossover module and ResNet, which replaces the parameterized computation with shifts in the temporal dimension and efficiently extracts temporal features without increasing the number of parameters and computation. The ResNetT is able to improve the real-time performance and deployability of the model while ensuring its accuracy. The core is our proposed zero-parameter and zero-computation module TSCM, and we combine TSCM with 2D convolution to form "TSCM+2D" hybrid convolution, which provides powerful spatial-temporal modeling capability, zero-parameter increase, and lower deployment cost compared with other spatial-temporal convolutions. Further, we apply "TSCM+2D" to ResBlock to form the new ResBlockT, which is the basis of the novel CSLR model ResNetT. We introduce stochastic gradient stops and multilevel connected temporal classification (CTC) loss to train this model, which reduces training memory usage while decreasing the final recognized word error rate (WER) and extends the ResNet network from image classification tasks to video recognition tasks. In addition, this study is the first in the field of CSLR to use only 2D convolution to extract spatial-temporal features of sign language videos for end-to-end recognition learning. Experiments on two large-scale continuous sign language datasets demonstrate the efficiency of the method.
computer science, cybernetics, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?