Abstract:Attention mechanism, which selectively focuses on source-side information to learn a context vector for generating target words, has been shown to be an effective method for neural machine translation (NMT). In fact, generating target words depends on not only the source-side information but also the target-side information. Although the vanilla NMT can acquire target-side information implicitly by recurrent neural networks (RNN), RNN cannot adequately capture the global relationship between target-side words. To solve this problem, this paper proposes a novel target-attention approach to capture this information, thus enhancing target word predictions in NMT. Specifically, we propose three variants of target-attention model to directly obtain the global relationship among target words: 1) a forward target-attention model that uses a target attention mechanism to incorporate previous historical target words into the prediction of the current target word; 2) a reverse target-attention model that adopts a reverse RNN model to obtain the entire reverse target words information, and then to combine with source context information to generate target sequence; 3) a bidirectional target-attention model that combines the forward target-attention model and reverse target-attention model together, which can make full use of target words to further improve the performance of NMT. Our methods can be integrated into both RNN based NMT and self-attention based NMT, and help NMT get global target-side information to improve translation performance. Experiments on the NIST Chinese-to-English and the WMT English-to-German translation tasks show that the proposed models achieve significant improvements over state-of-the-art baselines.

Coarse-To-Fine Learning for Neural Machine Translation.

Learning to Refine Source Representations for Neural Machine Translation

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding.

Frequency-Aware Contrastive Learning for Neural Machine Translation

Neural System Combination For Machine Translation

Multilingual Neural Machine Translation with Language Clustering

A Hierarchy-to-Sequence Attentional Neural Machine Translation Model.

Coarse-to-fine Few-shot Learning for Named Entity Recognition

Neural Machine Translation with Target-Attention Model.

Deep Fusing Pre-trained Models into Neural Machine Translation.

Unified Model Learning for Various Neural Machine Translation

Fine-grained attention mechanism for neural machine translation

Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation

Enhanced Neural Machine Translation by Learning from Draft.

Character-Aware Low-Resource Neural Machine Translation With Weight Sharing And Pre-Training

Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation

Progressive Multi-Granularity Training for Non-Autoregressive Translation

Optimizing Attention Mechanism for Neural Machine Transltion

Multi-channel Encoder for Neural Machine Translation

A Study of Multilingual Neural Machine Translation

Modeling Past and Future for Neural Machine Translation