Distilling Knowledge from an Ensemble of Models for Punctuation Prediction.

Jiangyan Yi,Jianhua Tao,Zhengqi Wen,Ya Li
DOI: https://doi.org/10.21437/interspeech.2017-1079
2017-01-01
Abstract:This paper proposes an approach to distill knowledge from an ensemble of models to a single deep neural network (DNN) student model for punctuation prediction. This approach makes the DNN student model mimic the behavior of the ensemble. The ensemble consists of three single models. Kullback-Leibler (KL) divergence is used to minimize the difference between the output distribution of the DNN student model and the behavior of the ensemble. Experimental results on English IWSLT2011 dataset show that the ensemble outperforms the previous state-of-the-art model by up to 4.0% absolute in overall F-I-score. The DNN student model also achieves up to 13.4% absolute overall F-I-score improvement over the conventionally-trained baseline models.
What problem does this paper attempt to address?