Higher Target Relevance Parallel Machine Translation with Low-Frequency Word Enhancement

Shuo Sun,Hongxu Hou,Yisong Wang
DOI: https://doi.org/10.1007/978-3-031-44198-1_28
2023-01-01
Abstract:Non-autoregressive translation (NAT) has received a surge of interest due to its success in inference speed by predicting all tokens independently and simultaneously. However, it is difficult for this paradigm to model the conditional information between words in the target side, which means its translation accuracy is sacrificed and damaged. Although many advanced studies are proposed to improve its generation quality, they come at the cost of decoding speed compared to its counterpart. In this paper, we propose to introduce an evaluation module to evaluate the NAT generations during training to guide model parameter update, and as a fine-tuning module during inference to generate plentiful fluency and faithfulness predictions. This recipe can significantly improve the model performance on the basis of ensuring the decoding efficiency of NAT. Furthermore, to mitigate the large prediction error of low-frequency words caused by knowledge distillation (KD) in non-autoregressive generation, we supply an enhanced KD to train NAT students, which exploits the complementarity of bilingual and monolingual, and transfer both knowledge to the NAT model. We not only verify our ideas on widely-used WMT14 English-German and WMT16 Romanian-English tasks, but also make more amelioration on the low-resource national languages CCMT2019 Mongolian-Chinese and CWMT2017 Uyghur-Chinese.
What problem does this paper attempt to address?