Attention With Sparsity Regularization for Neural Machine Translation and Summarization

Jiajun Zhang,Yang Zhao,Haoran Li,Chengqing Zong
DOI: https://doi.org/10.1109/TASLP.2018.2883740
2019-03-01
Abstract:The attention mechanism has become the de facto standard component in neural sequence to sequence tasks, such as machine translation and abstractive summarization. It dynamically determines which parts in the input sentence should be focused on when generating each word in the output sequence. Ideally, only few relevant input words should be attended to at each decoding time step and the attention weight distribution should be sparse and sharp. However, previous methods have no good mechanism to control this attention weight distribution. In this paper, we propose a sparse attention model in which a sparsity regularization term is designed to augment the objective function. We explore two kinds of regularizations: $L_{\infty }$-norm regularization and minimum entropy regularization, both of which aim to sharpen the attention weight distribution. Extensive experiments on both neural machine translation and abstractive summarization demonstrate that our proposed sparse attention model can substantially outperform the strong baselines. And the detailed analyses reveal that the final attention distribution indeed becomes sparse and sharp.
What problem does this paper attempt to address?