Data-Free Ensemble Knowledge Distillation for Privacy-conscious Multimedia Model Compression.

Zhiwei Hao,Yong Luo,Han Hu,Jianping An,Yonggang Wen
DOI: https://doi.org/10.1145/3474085.3475329
2021-01-01
Abstract:Recent advances in deep learning bring impressive performance for multimedia applications. Hence, compressing and deploying these applications on resource-limited edge devices via model compression becomes attractive. Knowledge distillation (KD) is one of the most popular model compression techniques. However, most well-behaved KD approaches require the original dataset, which is usually unavailable due to privacy issues, while existing data-free KD methods perform much worse than data-required counterparts. In this paper, we analyze previous data-free KD methods from the data perspective and point out that using a single pre-trained model limits the performance of these approaches. We then propose aDataFree Ensemble knowledge Distillation (DFED) framework, which contains a student network, a generator network, and multiple pre-trained teacher networks. During training, the student mimics behaviors of the ensemble of teachers using samples synthesized by a generator, which aims to enlarge the prediction discrepancy between the student and teachers. A moment matching loss term assists the generator training by minimizing the distance between activations of synthesized samples and real samples. We evaluate DFED on three popular image classification datasets. Results demonstrate that our method achieves significant performance improvements compared with previous works. We also design an ablation study to verify the effectiveness of each component of the proposed framework.
What problem does this paper attempt to address?