SAC: Accelerating and Structuring Self-Attention Via Sparse Adaptive Connection.

Xiaoya Li,Yuxian Meng,Mingxin Zhou,Qinghong Han,Fei Wu,Jiwei Li
DOI: https://doi.org/10.48550/arxiv.2003.09833
2020-01-01
Abstract:While the self-attention mechanism has been widely used in a wide variety oftasks, it has the unfortunate property of a quadratic cost with respect to theinput length, which makes it difficult to deal with long inputs. In this paper,we present a method for accelerating and structuring self-attentions: SparseAdaptive Connection (SAC). In SAC, we regard the input sequence as a graph andattention operations are performed between linked nodes. In contrast withprevious self-attention models with pre-defined structures (edges), the modellearns to construct attention edges to improve task-specific performances. Inthis way, the model is able to select the most salient nodes and reduce thequadratic complexity regardless of the sequence length. Based on SAC, we showthat previous variants of self-attention models are its special cases. Throughextensive experiments on neural machine translation, language modeling, graphrepresentation learning and image classification, we demonstrate SAC iscompetitive with state-of-the-art models while significantly reducing memorycost.
What problem does this paper attempt to address?