Difference-guided multi-scale spatial-temporal representation for sign language recognition

Liqing Gao,Lianyu Hu,Fan Lyu,Lei Zhu,Liang Wan,Chi-Man Pun,Wei Feng
DOI: https://doi.org/10.1007/s00371-023-02979-8
2023-07-31
Abstract:Sign language recognition (SLR) is a challenging task, which requires a thorough understanding of spatial-temporal visual features for translating it into comprehensible written or spoken language. However, existing SLR methods ignore the importance of key spatial-temporal representation due to its sparsity and inconsistency in space and time. To solve this problem, we present a difference-guided multi-scale spatial-temporal representation (DMST) learning model for SLR. In DMST, we devise two modules: (1) key spatial-temporal representation, to extract and enhance key spatial-temporal information by a spatial-temporal difference strategy and (2) multi-scale sequence alignment, to perceive and fuse multi-scale spatial-temporal features and achieve sequence mapping. The DMST model outperforms state-of-the-art performance on four public sign language datasets, which demonstrates the superiority of DMST model and the significance of key spatial-temporal representation for SLR.
computer science, software engineering
What problem does this paper attempt to address?