Improving Non-Autoregressive Sign Language Translation with Random Ordering Progressive Prediction Pretraining

Pei Yu,Changhao Lai,Cong Hu,Shan Liu,Liang Zhang,Biao Fu,Yidong Chen
DOI: https://doi.org/10.3233/faia240495
2024-01-01
Abstract:Recently, the Non-AutoRegressive (NAR) decoding mechanism, effectively reducing the inference latency of text generation, has been applied to Sign Language Translation (SLT). Typically, the current best NAR SLT model using a Curriculum-based Non-autoregressive Decoder (CND) outperforms AutoRegressive (AR) baselines in speed and performance. Although it has been proven that AutoRegressive Pre-trained Language Models (AR-PLMs) further boost the performance of AR SLT models, combining NAR Pretrained Language Models (NAR-PLMs) with NAR SLT model remains challenge due to (1) existing NAR-PLMs’ inability to model token dependencies between decoder layers, crucial for NAR SLT models using CND; (2) the modality gap between the decoder’s inputs of the NAR-PLMs and NAR SLT models. To address these, we propose a Random Ordering Progressive Prediction Pre-training task for NAR SLT models using CND, enabling the decoder to predict target sequences in diverse orderings and enhancing the modeling of target token dependencies between layers. Moreover, we propose a CTC-enhanced Soft Copy method to incorporate target-side information in the decoder’s inputs, alleviating the modality gap. Experimental results on PHOENIX-2014T and CSL-Daily demonstrate that our model consistently outperforms all strong baselines and achieves competitive performance with AR SLT models equipped with AR-PLMs.
What problem does this paper attempt to address?