TC<SUP>3</SUP>KD: Knowledge distillation via teacher-student cooperative curriculum customization

Chaofei Wang,Ke Yang,Shaowei Zhang,Gao Huang,Shiji Song
DOI: https://doi.org/10.1016/j.neucom.2022.07.055
IF: 6
2022-01-01
Neurocomputing
Abstract:Knowledge distillation aims to improve the performance of a lightweight student network by transferring some knowledge from a large-scale teacher network. Most existing knowledge distillation methods fol-low the traditional training strategy which feeds the sequence of mini-batches sampled randomly from the training set. Inspired by curriculum learning, we propose a novel knowledge distillation method via teacher-student cooperative curriculum customization. Specifically, a weighted ensemble of teacher and snapshot student is designed to measure the difficulty of samples. Dynamically update the ensemble weights and the snapshot student in the difficulty measurer that customizes appropriate curricula to guide the student network in different training stages. A "fetch and remove in balance" training scheduler is adopted to maintain the training stability and reduce the ranking cost. Extensive experiments on CIFAR-100, CINIC-10 and ImageNet validate the effectiveness of our method. As an independent training strategy of distillation, the proposed teacher-student cooperative curriculum customization paradigm also can be combined with the mainstream knowledge distillation approaches to improve their performance.(c) 2022 Published by Elsevier B.V.
What problem does this paper attempt to address?