Alleviating Data Insufficiency for Chinese Sign Language Recognition

Wanli Xue,Jingze Liu,Siyi Yan,Yuxi Zhou,Tiantian Yuan,Qing Guo
DOI: https://doi.org/10.1007/s44267-023-00028-5
2023-01-01
Abstract:Continuous Chinese sign language recognition (CCSLR) methods have shown their strong ability to learn excellent model architectures from datasets. However, due to data insufficiency, it is difficult to complete the CCSLR task. In this work, we focus on a simple but important solution to alleviate data insufficiency: how to refine the model architecture of a CCSLR network to improve the robustness of feature processing by using some better-quality non-Chinese sign language datasets. To this end, a simple empirical study was first conducted to verify the feasibility of knowledge transfer in the CCSLR task. Surprisingly, just by pre-training of our recognition model on a foreign sign language dataset, we can refine the model architecture and improve its robustness significantly. To make it more practical, the key issue of how to fine-tune the existing feature processing models for effective guidance should be carefully investigated. Then, we propose a novel scheme for fine-tuning of pre-trained models named FTP, which updates the spatial feature extractor initialized by a pre-trained backbone and freezes the temporal feature extractor implemented by a better shareable transformer encoder. Compared with the baseline method, our FTP method can achieve significant performance improvement on the public dataset USTC-CCSL.
What problem does this paper attempt to address?