Neural Machine Translation With GRU-Gated Attention Model

Biao Zhang,Deyi Xiong,Jun Xie,Jinsong Su
DOI: https://doi.org/10.1109/tnnls.2019.2957276
IF: 14.255
2020-11-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:Neural machine translation (NMT) heavily relies on context vectors generated by an attention network to predict target words. In practice, we observe that the context vectors for different target words are quite similar to one another and translations with such nondiscriminatory context vectors tend to be degenerative. We ascribe this similarity to the invariant source representations that lack dynamics across decoding steps. In this article, we propose a novel gated recurrent unit (GRU)-gated attention model (GAtt) for NMT. By updating the source representations with the previous decoder state via a GRU, GAtt enables translation-sensitive source representations that then contribute to discriminative context vectors. We further propose a variant of GAtt by swapping the input order of the source representations and the previous decoder state to the GRU. Experiments on the NIST Chinese–English, WMT14 English–German, and WMT17 English–German translation tasks show that the two GAtt models achieve significant improvements over the vanilla attention-based NMT. Further analyses on the attention weights and context vectors demonstrate the effectiveness of GAtt in enhancing the discriminating capacity of representations and handling the challenging issue of overtranslation.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture
What problem does this paper attempt to address?