Grouped Logit Distillation Enhanced with Superclass Awareness for Efficient Knowledge Transfer

Shuoxi Zhang,Hanpeng Liu,Yuyi Wang,Kun He
DOI: https://doi.org/10.3233/faia240786
2024-01-01
Abstract:Knowledge distillation (KD) facilitates student training by transferring information beyond plain labels, specifically through the categorical relationships from the teacher. However, this class relationship knowledge is, by nature, easily dominated by a few classes. This phenomenon prevents knowledge distillation from fully extracting the knowledge of the teacher model, thereby impeding the transfer of knowledge. To this end, we introduce a grouping strategy to the knowledge distillation paradigm, termed Grouped Logit Distillation (GLD). This strategy involves distilling knowledge within each group and across all groups, potentially transferring relationships in a comprehensive manner. Furthermore, we delve deeper into the grouping mechanism and attempt to incorporate a superclass mechanism using information derived from features of the teacher model. Our enhanced version, GLD++, performs knowledge distillation more meticulously by organizing information based on superclasses. We evaluate the effectiveness of our approaches through extensive experiments across standard benchmark datasets, obtaining state-of-the-art performance.
What problem does this paper attempt to address?