Flexible Knowledge Distillation with an Evolutional Network Population

Jie Lei,Zhao Liu,Mingli Song,Juan Xu,Jianping Shen,Ronghua Liang
DOI: https://doi.org/10.1109/ICME51207.2021.9428226
2021-01-01
Abstract:Deep neural networks have continually surpassed traditional methods on a variety of computer vision tasks. Though deep neural networks are very powerful, the large number of parameters and complex structures consume considerable storage and calculation time, making it hard to deploy with limited resources. To tackle this issue, many recently proposed knowledge distillation approaches are aimed at obtaining a small student network to imitate a large teacher network. However, the student network structure is pre-defined and may be hard to train. In this paper, we propose to distill knowledge with an evolutional student network population. The population is initialized with several basic structures and each network is evaluated by the imitation ability (i.e., fitness) to the teacher network. By reusing the weights, we provide five enhancement options to strengthen the networks with high fitness and abandon the weak ones. By changing the fitness criterion, we can select networks to meet different requirements, such as balancing size and accuracy. This allows one to find a superior student network structure that better imitates the teacher model from various aspects with easier training. The experimental results demonstrate the proposed method can achieve superior performance of knowledge distillation with flexible student structures.
What problem does this paper attempt to address?