Adversarial co-distillation learning for image recognition

Haoran Zhang,Zhenzhen Hu,Wei Qin,Mingliang Xu,Meng Wang
DOI: https://doi.org/10.1016/j.patcog.2020.107659
IF: 8
2021-03-01
Pattern Recognition
Abstract:<p>Knowledge distillation is an effective way to transfer the knowledge from a pre-trained teacher model to a student model. Co-distillation, as an online variant of distillation, further accelerates the training process and paves a new way to explore the "dark knowledge" by training <em>n</em> models in parallel. In this paper, we explore the "divergent examples", which can make the classifiers have different predictions and thus induce the "dark knowledge", and we propose a novel approach named Adversarial Co-distillation Networks (ACNs) to enhance the "dark knowledge" by generating extra divergent examples. Note that we do not involve any extra dataset, and we only utilize the standard training set to train the entire framework. ACNs are end-to-end frameworks composed of two parts: an adversarial phase consisting of Generative Adversarial Networks (GANs) to generate the divergent examples and a co-distillation phase consisting of multiple classifiers to learn the divergent examples. These two phases are learned in an iterative and adversarial way. To guarantee the quality of the divergent examples and the stability of ACNs, we further design "Weakly Residual Connection" module and "Restricted Adversarial Search" module to assist in the training process. Extensive experiments with various deep architectures on different datasets well demonstrate the effectiveness of our approach.</p>
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?