A Pointer-Generator Based Abstractive Summarization Model with Knowledge Distillation

Tao Dong,Shimin Shan,Yu Liu,Yue Qian,Anqi Ma
DOI: https://doi.org/10.1007/978-3-030-92307-5_20
2021-01-01
Abstract:The use of large-scale pre-trained models for text summarization has attracted increasing attention in the computer science community. However, pre-training models with millions of parameters and long training time cause difficulty to deployment. Furthermore, pre-training models focus on understanding language but ignore reproduction of factual details when generating text. In this paper, we propose a method for text summarization that applies knowledge distillation to a pre-trained model called the teacher model. We build a novel sequence-to-sequence model as the student model to learn from the teacher model’s knowledge for imitation. Specifically, we propose a variant of the pointer-generator network, which integrates multi-head attention mechanism, coverage mechanism and copy mechanism. We apply the variant to our student model to solve the word repetition and out-of-vocabulary words problem, so that improving the quality of generation. With experiments on Gigaword and Weibo datasets, our model achieves better performance and costs less time beyond the baseline models.
What problem does this paper attempt to address?