Yonghui Wu,Mike Schuster,Zhifeng Chen,Quoc V. Le,Mohammad Norouzi,Wolfgang Macherey,Maxim Krikun,Yuan Cao,Qin Gao,Klaus Macherey,Jeff Klingner,Apurva Shah,Melvin Johnson,Xiaobing Liu,Łukasz Kaiser,Stephan Gouws,Yoshikiyo Kato,Taku Kudo,Hideto Kazawa,Keith Stevens,George Kurian,Nishant Patil,Wei Wang,Cliff Young,Jason Smith,Jason Riesa,Alex Rudnick,Oriol Vinyals,Greg Corrado,Macduff Hughes,Jeffrey Dean

Abstract:Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.

Lexicons and Minimum Risk Training for Neural Machine Translation: NAIST-CMU at WAT2016

Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT

The University of Edinburgh's Neural MT Systems for WMT17

High Quality Rather than High Model Probability: Minimum Bayes Risk Decoding with Neural Metrics

Edinburgh Neural Machine Translation Systems for WMT 16

SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

QCRI Machine Translation Systems for IWSLT 16

NICT's Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task

Adversarial Neural Machine Translation.

Meta Ensemble for Japanese-Chinese Neural Machine Translation: Kyoto-U+ECNU Participation to WAT 2020.

Bilingual Attention Based Neural Machine Translation

Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance.

Neural Machine Translation Based on Improved Actor-Critic Method

NICT's Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task

Towards Reliable Neural Machine Translation with Consistency-Aware Meta-Learning

Neural Network Transduction Models in Transliteration Generation

NAIST Simultaneous Speech Translation System for IWSLT 2024

Training With Additional Semantic Constraints For Enhancing Neural Machine Translation