Low-Rank and Locality Constrained Self-Attention for Sequence Modeling.

Qipeng Guo,Xipeng Qiu,Xiangyang Xue,Zheng Zhang
DOI: https://doi.org/10.1109/TASLP.2019.2944078
2019-01-01
Abstract:Self-attention mechanism becomes more and more popular in natural language processing (NLP) applications. Recent studies show the Transformer architecture which relies mainly on the attention mechanism achieves much success on large datasets. But a raised problem is its generalization ability is weaker than CNN and RNN on many moderate-sized datasets. We think the reason can be attributed to its u...
What problem does this paper attempt to address?