Prevent the Language Model from being Overconfident in Neural Machine Translation

Mengqi Miao,Fandong Meng,Yijin Liu,Xiao-Hua Zhou,Jie Zhou
DOI: https://doi.org/10.48550/arXiv.2105.11098
2021-05-31
Abstract:The Neural Machine Translation (NMT) model is essentially a joint language model conditioned on both the source sentence and partial translation. Therefore, the NMT model naturally involves the mechanism of the Language Model (LM) that predicts the next token only based on partial translation. Despite its success, NMT still suffers from the hallucination problem, generating fluent but inadequate translations. The main reason is that NMT pays excessive attention to the partial translation while neglecting the source sentence to some extent, namely overconfidence of the LM. Accordingly, we define the Margin between the NMT and the LM, calculated by subtracting the predicted probability of the LM from that of the NMT model for each token. The Margin is negatively correlated to the overconfidence degree of the LM. Based on the property, we propose a Margin-based Token-level Objective (MTO) and a Margin-based Sentencelevel Objective (MSO) to maximize the Margin for preventing the LM from being overconfident. Experiments on WMT14 English-to-German, WMT19 Chinese-to-English, and WMT14 English-to-French translation tasks demonstrate the effectiveness of our approach, with 1.36, 1.50, and 0.63 BLEU improvements, respectively, compared to the Transformer baseline. The human evaluation further verifies that our approaches improve translation adequacy as well as fluency.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
This paper attempts to solve the hallucination problem in neural machine translation (NMT), that is, the problem that the translation result is fluent but not well - matched with the source sentence. Specifically, when generating translations, NMT models focus too much on part of the translation content and ignore the information of the source sentence, leading to over - confidence of the language model (LM). This over - confidence makes the NMT model degenerate into a language model that only depends on part of the translation in some cases, thus producing inaccurate translations. To alleviate this problem, the author defines a new metric - the margin between NMT and LM, which is obtained by subtracting the prediction probability of LM for each word from the prediction probability of the NMT model. The smaller this margin is, the more over - confident the LM is. Based on this metric, the author proposes two optimization objectives: 1. **Margin - based Token - level Objective (MTO)**: Maximize the margin of each word to prevent the LM from being over - confident. 2. **Margin - based Sentence - level Objective (MSO)**: Further optimize through a dynamic weighting function to reduce the negative impact of "dirty data" (i.e., target sentences that do not match the source sentence) in the training data. The experimental results show that these two methods significantly improve the accuracy and fluency of translation in multiple large - scale translation tasks, increasing the BLEU scores by 1.36, 1.50 and 0.63 respectively in the WMT14 English - German, WMT19 Chinese - English and WMT14 English - French translation tasks. In addition, human evaluation also verifies the effectiveness of these methods in improving translation accuracy and fluency.