Abstract:Based on a unified encoder-decoder framework with attentional mechanism, neural machine translation (NMT) models have attracted much attention and become the mainstream in the community of machine translation. Generally, the NMT decoders produce translation in a left-to-right way. As a result, only left-to-right target-side contexts from the generated translations are exploited, while the right-to-left target-side contexts are completely unexploited for translation. In this paper, we extend the conventional attentional encoder-decoder NMT framework by introducing a backward decoder, in order to explore asynchronous bidirectional decoding for NMT. In the first step after encoding, our backward decoder learns to generate the target-side hidden states in a right-to-left manner. Next, in each timestep of translation prediction, our forward decoder concurrently considers both the source-side and the reverse target-side hidden states via two attention models. Compared with previous models, the innovation in this architecture enables our model to fully exploit contexts from both source side and target side, which improve translation quality altogether. We conducted experiments on NIST Chinese-English, WMT English-German and Finnish-English translation tasks to investigate the effectiveness of our model. Experimental results show that (1) our improved RNN-based NMT model achieves significant improvements over the conventional RNNSearch by 1.44/-3.02, 1.11/-1.01, and 1.23/-1.27 average BLEU and TER points, respectively; and (2) our enhanced Transformer outperforms the standard Transformer by 1.56/-1.49, 1.76/-2.49, and 1.29/-1.33 average BLEU and TER points, respectively. We released our code at https://github.com/DeepLearnXMU/ABD-NMT.

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Mutual Information and Diverse Decoding Improve Neural Machine Translation.

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

Shared-Private Bilingual Word Embeddings for Neural Machine Translation

Transductive Ensemble Learning for Neural Machine Translation.

Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Effective Cross-lingual Transfer of Neural Machine Translation Models without Shared Vocabularies

Balancing Cost and Benefit with Tied-Multi Transformers

Multi-channel Encoder for Neural Machine Translation

Layer-Wise Coordination Between Encoder and Decoder for Neural Machine Translation

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation.

Multi-Unit Transformers for Neural Machine Translation

Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models

Exploiting Reverse Target-Side Contexts for Neural Machine Translation Via Asynchronous Bidirectional Decoding

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables

Multi-split Reversible Transformers Can Enhance Neural Machine Translation.

Sharing Attention Weights for Fast Transformer

Exploiting Monolingual Data at Scale for Neural Machine Translation.

Improving Neural Machine Translation Model with Deep Encoding Information