Abstract:Based on a unified encoder-decoder framework with attentional mechanism, neural machine translation (NMT) models have attracted much attention and become the mainstream in the community of machine translation. Generally, the NMT decoders produce translation in a left-to-right way. As a result, only left-to-right target-side contexts from the generated translations are exploited, while the right-to-left target-side contexts are completely unexploited for translation. In this paper, we extend the conventional attentional encoder-decoder NMT framework by introducing a backward decoder, in order to explore asynchronous bidirectional decoding for NMT. In the first step after encoding, our backward decoder learns to generate the target-side hidden states in a right-to-left manner. Next, in each timestep of translation prediction, our forward decoder concurrently considers both the source-side and the reverse target-side hidden states via two attention models. Compared with previous models, the innovation in this architecture enables our model to fully exploit contexts from both source side and target side, which improve translation quality altogether. We conducted experiments on NIST Chinese-English, WMT English-German and Finnish-English translation tasks to investigate the effectiveness of our model. Experimental results show that (1) our improved RNN-based NMT model achieves significant improvements over the conventional RNNSearch by 1.44/-3.02, 1.11/-1.01, and 1.23/-1.27 average BLEU and TER points, respectively; and (2) our enhanced Transformer outperforms the standard Transformer by 1.56/-1.49, 1.76/-2.49, and 1.29/-1.33 average BLEU and TER points, respectively. We released our code at https://github.com/DeepLearnXMU/ABD-NMT.

Cache Friendly Parallelization of Neural Encoder-Decoder Models Without Padding on Multi-core Architecture.

Parallelizing and Optimizing Neural Encoder–Decoder Models Without Padding on Multi-Core Architecture

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Chunk-Based Bi-Scale Decoder for Neural Machine Translation.

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Fast Decoding in Sequence Models using Discrete Latent Variables

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation

Accelerating Transformer Inference for Translation via Parallel Decoding

Feedforward Sequential Memory Networks Based Encoder-Decoder Model for Machine Translation

Deconvolution-Based Global Decoding for Neural Machine Translation.

Exploiting Reverse Target-Side Contexts for Neural Machine Translation Via Asynchronous Bidirectional Decoding

Multi-channel Encoder for Neural Machine Translation

Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation

C L ] 1 0 Ju n 20 18 Deconvolution-Based Global Decoding for Neural Machine Translation

Parallelizing non-linear sequential models over the sequence length

Model Embedding dimension : 400-1000 Hidden layer dimension

Learning to Remember Translation History with a Continuous Cache.

MobileNMT: Enabling Translation in 15MB and 30ms

Improving Neural Machine Translation Model with Deep Encoding Information

Fast Structured Decoding for Sequence Models

Nnscaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training.