Fine-tune BERT with Sparse Self-Attention Mechanism.

Baiyun Cui,Yingming Li,Ming Chen,Zhongfei Zhang
DOI: https://doi.org/10.18653/v1/d19-1361
2019-01-01
Abstract:In this paper, we develop a novel Sparse Self-Attention Fine-tuning model (referred as SSAF) which integrates sparsity into self-attention mechanism to enhance the finetuning performance of BERT. In particular, sparsity is introduced into the self-attention by replacing softmax function with a controllable sparse transformation when fine-tuning with BERT. It enables us to learn a structurally sparse attention distribution, which leads to a more interpretable representation for the whole input. The proposed model is evaluated on sentiment analysis, question answering, and natural language inference tasks. The extensive experimental results across multiple datasets demonstrate its effectiveness and superiority to the baseline methods.
What problem does this paper attempt to address?