Learning Slimming SSD Through Pruning and Knowledge Distillation

Zhishan Li,Xiaozhou Xu,Lei Xie,Hongye Su
DOI: https://doi.org/10.1109/cac48633.2019.8996995
2019-01-01
Abstract:SSD [2] has a very good performance in the field of object detection. When we use SSD to detect the objects in a image, SSD produces a number of different scale prior boxes on different scale feature maps. Then, it can use the base CNN to extract features for the final Classification and Regression. Though it has simplified the process comparing to Faster RCNN, the FLOPs calculation and the storage size of model are still enormous for ordinary embedded devices. Parameter pruning is a very popular method in model compression for DNN. It can prune the redundant weights in the originally trained model and then obtain a lighter one. However, the accuracy after pruning usually drops. Knowledge distillation [15] is another relatively new training method, which can make up for some decline in the accuracy. The teacher network guides small network training with low accuracy, which enables small models to have better performance. In this paper, we use SSD as the original network in object detection for VOC2007, and then prune the SSD. Finally, we use the method of Knowledge Distillation to improve the accuracy of the model.
What problem does this paper attempt to address?