Abstract:As a new neural machine translation approach, Non-Autoregressive machine Translation (NAT) has attracted attention recently due to its high efficiency in inference. However, the high efficiency has come at the cost of not capturing the sequential dependency on the target side of translation, which causes NAT to suffer from two kinds of translation errors: 1) repeated translations (due to indistinguishable adjacent decoder hidden states), and 2) incomplete translations (due to incomplete transfer of source side information via the decoder hidden states). In this paper, we propose to address these two problems by improving the quality of decoder hidden representations via two auxiliary regularization terms in the training process of an NAT model. First, to make the hidden states more distinguishable, we regularize the similarity between consecutive hidden states based on the corresponding target tokens. Second, to force the hidden states to contain all the information in the source sentence, we leverage the dual nature of translation tasks (e.g., English to German and German to English) and minimize a backward reconstruction error to ensure that the hidden states of the NAT decoder are able to recover the source side sentence. Extensive experiments conducted on several benchmark datasets show that both regularization strategies are effective and can alleviate the issues of repeated translations and incomplete translations in NAT models. The accuracy of NAT models is therefore improved significantly over the state-of-the-art NAT models with even better efficiency for inference.

Higher Target Relevance Parallel Machine Translation with Low-Frequency Word Enhancement

Hybrid-Regressive Paradigm for Accurate and Speed-Robust Neural Machine Translation

Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input

Neighbors Are Not Strangers: Improving Non-Autoregressive Translation under Low-Frequency Lexical Constraints

Non-Autoregressive Machine Translation with Auxiliary Regularization

Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation

Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation

Redistributing Low-Frequency Words: Making the Most of Monolingual Data in Non-Autoregressive Translation

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC

Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation

Context-Aware Cross-Attention for Non-Autoregressive Translation

MR-P: A Parallel Decoding Algorithm for Iterative Refinement Non-Autoregressive Translation

Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation

Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information

Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation

Non-Autoregressive Document-Level Machine Translation

Glancing Transformer for Non-Autoregressive Neural Machine Translation.

A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation.