Knowledge Distillation with Source-free Unsupervised Domain Adaptation for BERT Model Compression.

Jing Tian,Juan Chen,Ningjiang Chen,Lin Bai,Suqun Huang
DOI: https://doi.org/10.1109/CSCWD57460.2023.10152760
2023-01-01
Abstract:The pre-training language model BERT has brought significant performance improvements to a series of natural language processing tasks, but due to the large scale of the model, it is difficult to be applied in many practical application scenarios. With the continuous development of edge computing, deploying the models on resource-constrained edge devices has become a trend. Considering the distributed edge environment, how to take into account issues such as data distribution differences, labeling costs, and privacy while the model is shrinking is a critical task. The paper proposes a new BERT distillation method with source-free unsupervised domain adaptation. By combining source-free unsupervised domain adaptation and knowledge distillation for optimization and improvement, the performance of the BERT model is improved in the case of cross-domain data. Compared with other methods, our method can improve the average prediction accuracy by up to around 4% through the experimental evaluation of the cross-domain sentiment analysis task.
What problem does this paper attempt to address?