Abstract:The Neural Machine Translation (NMT) model is essentially a joint language model conditioned on both the source sentence and partial translation. Therefore, the NMT model naturally involves the mechanism of the Language Model (LM) that predicts the next token only based on partial translation. Despite its success, NMT still suffers from the hallucination problem, generating fluent but inadequate translations. The main reason is that NMT pays excessive attention to the partial translation while neglecting the source sentence to some extent, namely overconfidence of the LM. Accordingly, we define the Margin between the NMT and the LM, calculated by subtracting the predicted probability of the LM from that of the NMT model for each token. The Margin is negatively correlated to the overconfidence degree of the LM. Based on the property, we propose a Margin-based Token-level Objective (MTO) and a Margin-based Sentencelevel Objective (MSO) to maximize the Margin for preventing the LM from being overconfident. Experiments on WMT14 English-to-German, WMT19 Chinese-to-English, and WMT14 English-to-French translation tasks demonstrate the effectiveness of our approach, with 1.36, 1.50, and 0.63 BLEU improvements, respectively, compared to the Transformer baseline. The human evaluation further verifies that our approaches improve translation adequacy as well as fluency.

What problem does this paper attempt to address?

This paper attempts to solve the hallucination problem in neural machine translation (NMT), that is, the problem that the translation result is fluent but not well - matched with the source sentence. Specifically, when generating translations, NMT models focus too much on part of the translation content and ignore the information of the source sentence, leading to over - confidence of the language model (LM). This over - confidence makes the NMT model degenerate into a language model that only depends on part of the translation in some cases, thus producing inaccurate translations. To alleviate this problem, the author defines a new metric - the margin between NMT and LM, which is obtained by subtracting the prediction probability of LM for each word from the prediction probability of the NMT model. The smaller this margin is, the more over - confident the LM is. Based on this metric, the author proposes two optimization objectives: 1. **Margin - based Token - level Objective (MTO)**: Maximize the margin of each word to prevent the LM from being over - confident. 2. **Margin - based Sentence - level Objective (MSO)**: Further optimize through a dynamic weighting function to reduce the negative impact of "dirty data" (i.e., target sentences that do not match the source sentence) in the training data. The experimental results show that these two methods significantly improve the accuracy and fluency of translation in multiple large - scale translation tasks, increasing the BLEU scores by 1.36, 1.50 and 0.63 respectively in the WMT14 English - German, WMT19 Chinese - English and WMT14 English - French translation tasks. In addition, human evaluation also verifies the effectiveness of these methods in improving translation accuracy and fluency.

Prevent the Language Model from being Overconfident in Neural Machine Translation

Large Margin Neural Language Model

Language Models are Good Translators

Language Model-Driven Unsupervised Neural Machine Translation

Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective

Improving Multilingual Translation by Representation and Gradient Regularization

Simple Fusion: Return of the Language Model

Towards Making the Most of BERT in Neural Machine Translation

Improving Neural Machine Translation with Sentence Alignment Learning.

Semi-Supervised Neural Machine Translation Via Marginal Distribution Estimation

Reward Optimization for Neural Machine Translation with Learned Metrics

Explicitly Modeling Word Translations in Neural Machine Translation

Improving Neural Machine Translation by Achieving Knowledge Transfer with Sentence Alignment Learning

Did Translation Models Get More Robust Without Anyone Even Noticing?

Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning

Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model Editing

Training With Additional Semantic Constraints For Enhancing Neural Machine Translation

Combining Discrete Lexicon Probabilities with NMT for Low-Resource Mongolian-Chinese Translation

Neural System Combination For Machine Translation

Adversarial Training for Unknown Word Problems in Neural Machine Translation

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation