Abstract:Attention mechanism, which selectively focuses on source-side information to learn a context vector for generating target words, has been shown to be an effective method for neural machine translation (NMT). In fact, generating target words depends on not only the source-side information but also the target-side information. Although the vanilla NMT can acquire target-side information implicitly by recurrent neural networks (RNN), RNN cannot adequately capture the global relationship between target-side words. To solve this problem, this paper proposes a novel target-attention approach to capture this information, thus enhancing target word predictions in NMT. Specifically, we propose three variants of target-attention model to directly obtain the global relationship among target words: 1) a forward target-attention model that uses a target attention mechanism to incorporate previous historical target words into the prediction of the current target word; 2) a reverse target-attention model that adopts a reverse RNN model to obtain the entire reverse target words information, and then to combine with source context information to generate target sequence; 3) a bidirectional target-attention model that combines the forward target-attention model and reverse target-attention model together, which can make full use of target words to further improve the performance of NMT. Our methods can be integrated into both RNN based NMT and self-attention based NMT, and help NMT get global target-side information to improve translation performance. Experiments on the NIST Chinese-to-English and the WMT English-to-German translation tasks show that the proposed models achieve significant improvements over state-of-the-art baselines.

Modeling Coverage for Neural Machine Translation

Coverage-based Neural Machine Translation.

Coverage Embedding Models for Neural Machine Translation

Modeling Coverage for Non-Autoregressive Neural Machine Translation

A Simple And Effective Approach To Coverage-Aware Neural Machine Translation

Learning When to Attend for Neural Machine Translation

Neural Machine Translation with Target-Attention Model.

On the Language Coverage Bias for Neural Machine Translation

Optimizing Attention Mechanism for Neural Machine Transltion

Modeling Past and Future for Neural Machine Translation

Neural Machine Translation with Recurrent Attention Modeling

History Attention for Source-Target Alignment in Neural Machine Translation.

Neural Machine Translation with Supervised Attention

Interactive Attention for Neural Machine Translation

Neural Machine Translation with Deep Attention

Syntax-Directed Attention for Neural Machine Translation

Temporal Attention Model for Neural Machine Translation

Universal Vector Neural Machine Translation With Effective Attention

Neural Machine Translation Advised by Statistical Machine Translation

Bilingual Attention Based Neural Machine Translation

A Simple and Effective Approach to Coverage-Aware Neural Machine Translation Supplementary Material