Improving BERT with Self-Supervised Attention

Yiren Chen,Xiaoyu Kou,Jiangang Bai,Yunhai Tong
DOI: https://doi.org/10.1109/access.2021.3122273
IF: 3.9
2021-01-01
IEEE Access
Abstract:One of the most popular paradigms of applying large pre-trained NLP modelssuch as BERT is to fine-tune it on a smaller dataset. However, one challengeremains as the fine-tuned model often overfits on smaller datasets. A symptomof this phenomenon is that irrelevant or misleading words in the sentence,which are easy to understand for human beings, can substantially degrade theperformance of these finetuned BERT models. In this paper, we propose a noveltechnique, called Self-Supervised Attention (SSA) to help facilitate thisgeneralization challenge. Specifically, SSA automatically generates weak,token-level attention labels iteratively by probing the fine-tuned model fromthe previous iteration. We investigate two different ways of integrating SSAinto BERT and propose a hybrid approach to combine their benefits. Empirically,through a variety of public datasets, we illustrate significant performanceimprovement using our SSA-enhanced BERT model.
What problem does this paper attempt to address?