Accelerating BERT Inference for Sequence Labeling Via Early-Exit.

Xiaonan Li,Yunfan Shao,Tianxiang Sun,Hang Yan,Xipeng Qiu,Xuanjing Huang
DOI: https://doi.org/10.18653/v1/2021.acl-long.16
2021-01-01
Abstract:Both performance and efficiency are crucial factors for sequence labeling tasks in many real-world scenarios. Although the pre-trained models (PTMs) have significantly improved the performance of various sequence labeling tasks, their computational cost is expensive. To alleviate this problem, we extend the recent successful early-exit mechanism to accelerate the inference of PTMs for sequence labeling tasks. However, existing early-exit mechanisms are specifically designed for sequence-level tasks, rather than sequence labeling. In this paper, we first propose SENTEE: a simple extension of SENTence-level Early-Exit for sequence labeling tasks. To further reduce computational cost, we also propose TOKEE: a TOKen-level Early-Exit mechanism that allows partial tokens to exit early at different layers. Considering the local dependency inherent in sequence labeling, we employed a window-based criterion to decide for a token whether or not to exit. The token-level early-exit brings the gap between training and inference, so we introduce an extra self-sampling fine-tuning stage to alleviate it. The extensive experiments on three popular sequence labeling tasks show that our approach can save up to 66%similar to 75% inference cost with minimal performance degradation. Compared with competitive compressed models such as DistilBERT, our approach can achieve better performance under the same speed-up ratios of 2x, 3x, and 4x.(1)
What problem does this paper attempt to address?