Abstract:Availability of very high computational power along with the development of deep neural network (DNN) technology has enabled rapid progress of machine translation technology. The powerful representation ability of the deep neural network also enables the neural machine translation technology (NMT) to exploit the available large-scale bilingual parallel corpus as well as the computing power to provide a highly effective translation model. Nevertheless, the existing neural machine translation models only utilize the top layer encoder information, whereas the information available in deeper encoding layers is often ignored. This significantly constrains the performance of the translation model. To address this issue, in this paper, we propose a novel neural machine translation model which can fully exploit the deep encoding information. The core idea is to use different ways of aggregating the information from different encoding layers. We further design three different aggregation strategies including parallel layer, multi-layer, and dynamic layer encoding information aggregations. Three translation models are correspondingly trained and compared with the baseline transformer model for the Chinese-to-English translation task. The experimental results indicate that the BLEU-4 score of the proposed model has been increased by 0.89 compared with that of the benchmark model. Experiments demonstrate the effectiveness of the proposed method.

Improving Neural Machine Translation with Pre-trained Representation

Improving Neural Machine Translation Models with Monolingual Data

Exploiting Monolingual Data at Scale for Neural Machine Translation.

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation

Improving Non-autoregressive Neural Machine Translation with Monolingual Data

Towards Making the Most of Context in Neural Machine Translation

Joint Training for Neural Machine Translation Models with Monolingual Data

Improving Multilingual Translation by Representation and Gradient Regularization

Neural Machine Translation with Joint Representation

Incorporating Pre-trained Model into Neural Machine Translation

Improving Neural Machine Translation Model with Deep Encoding Information

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation

Neural System Combination For Machine Translation

Neural Machine Translation with Monolingual Translation Memory

Exploiting Cross-Sentence Context for Neural Machine Translation

Modeling Past and Future for Neural Machine Translation

Improving Neural Machine Translation by Achieving Knowledge Transfer with Sentence Alignment Learning

Reciprocal Supervised Learning Improves Neural Machine Translation

Towards Making the Most of BERT in Neural Machine Translation

Document-level Neural Machine Translation Using BERT As Context Encoder