Knowledge Distillation with Category-Aware Attention and Discriminant Logit Losses

Lei Jiang,Wengang Zhou,Houqiang Li
DOI: https://doi.org/10.1109/icme.2019.00308
2019-01-01
Abstract:Deep neural networks (DNNs) usually suffer large amount of storage and computation, limiting their deployment on resource constrained platforms. Knowledge distillation is an effective way to address the above limitation by transferring knowledge from a large while accurate teacher model to a small yet fast student model. In this paper, we propose two objective functions to optimize the knowledge transferring process. First, we propose a category-aware attention loss which works at the convolutional feature level and catches object localization information. Second, we propose a discriminant logit loss at fully-connected feature level to capture classification information. The combined two objective functions are able to integrate different level features and guide the training of the student. We demonstrate the effectiveness of our approach on several CNN models across various datasets, and show consistent performance gain with the proposed method.
What problem does this paper attempt to address?