Attention-based Bidirectional Long Short-Term Memory Networks for Relation Classification Using Knowledge Distillation from BERT

Zihan Wang,Bo Yang
DOI: https://doi.org/10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00100
2020-01-01
Abstract:Relation classification is an important task in the field of natural language processing. Today the best-performing models often use huge, transformer-based neural architectures like BERT and XLNet and have hundreds of millions of network parameters. These large neural networks have led to the belief that the shallow neural networks of the previous generation for relation classification are obsolete. However, due to large network size and low inference speed, these models may be impractical in on-line real-time systems or resource-restricted systems. To address this issue, we try to accelerate these well-performing language models by compressing them. Specifically, we distill knowledge for relation classification from a huge, transformer-based language model, BERT, into an Attention-Based Bidirectional Long Short-Term Memory Network. We run our model on the SemEval-2010 relation classification task. According to the experiment results, the performance of our model exceeds that of other LSTM-based methods and almost catches up that of BERT. For model inference time, our model has 157 times fewer network parameters, and as a result, it uses about 229 times less inference time than BERT.
What problem does this paper attempt to address?