An Efficient Method for Model Pruning Using Knowledge Distillation with Few Samples.

ZhaoJing Zhou,Yun Zhou,Zhuqing Jiang,Aidong Men,Haiying Wang
DOI: https://doi.org/10.1109/icassp43922.2022.9746024
2022-01-01
Abstract:Deep neural network compression methods can produce small-scale networks and utilizes fine-tuning to get back the dropped accuracy. Despite their remarkable performance, the fine-tuning procedure is limited to the requirement of a huge training dataset, which is a time-consuming progress. To address the issue, few-sample knowledge distillation (FSKD) has been proposed for data efficiency. However, FSKD needs to add additional convolution layers for compressed networks during training, which increases the complexity of network structure. In this paper, we present Progressive Feature Distribution Distillation (PFDD) without modifying network structures, which surpasses FSKD. Concretely, it is based on a progressive training strategy that is efficient for matching feature distributions between compressed network and original network. Thus, we can notably exploit both external information from samples and internal information from network, where using a small proportion of training dataset can yield quite considerable results. Experiments on various datasets and architectures demonstrate that our distillation approach is remarkably efficient and effective in improving compressed networks’ performance while only few samples have been applied.
What problem does this paper attempt to address?