Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation

Zhebin Zhang,Sai Wu,Gang Chen,Dawei Jiang
DOI: https://doi.org/10.1109/icbk50248.2020.00057
2020-01-01
Abstract:In sequence-to-sequence learning, models based on the self-attention mechanism dominate the network structures used for neural machine translation. Recently, convolutional networks have been demonstrated to perform excellently on various translation tasks. Despite the fact that self-attention and convolution have different strengths in modeling sequences, few efforts have been devoted to combining them. In this work, we propose a hybrid model that benefits from both mechanisms. We combine a self-attention module and a dynamic convolution module by taking a weighted sum of their outputs where the weights can be dynamically learned by the model during training. Experimental results show that our hybrid model outperforms baseline models built solely on either of these two mechanisms. And we produce new state-of-the-art results on IWSLT'15 English-German dataset.
What problem does this paper attempt to address?