Abstract:This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary detection, the amount of information transferred from the teacher to the student is restricted, thus limiting the utility of knowledge distillation. Performance can be improved by leveraging information of possible subclasses within the classes. To that end, we propose the so-called Subclass Knowledge Distillation (SKD), a process of transferring the knowledge of predicted subclasses from a teacher to a smaller student. Meaningful information that is not in the teacher's class logits but exists in subclass logits (e.g., similarities within classes) will be conveyed to the student through the SKD, which will then boost the student's performance. Analytically, we measure how much extra information the teacher can provide the student via the SKD to demonstrate the efficacy of our work. The framework developed is evaluated in clinical application, namely colorectal polyp binary classification. It is a practical problem with two classes and a number of subclasses per class. In this application, clinician-provided annotations are used to define subclasses based on the annotation label's variability in a curriculum style of learning. A lightweight, low-complexity student trained with the SKD framework achieves an F1-score of 85.05%, an improvement of 1.47%, and a 2.10% gain over the student that is trained with and without conventional knowledge distillation, respectively. The 2.10% F1-score gap between students trained with and without the SKD can be explained by the extra subclass knowledge, i.e., the extra 0.4656 label bits per sample that the teacher can transfer in our experiment.

Distilling Knowledge via Intermediate Classifiers

What Knowledge Gets Distilled in Knowledge Distillation?

On the Efficacy of Knowledge Distillation

Tree-like Decision Distillation

Knowledge Distillation with the Reused Teacher Classifier

Cooperative Knowledge Distillation: A Learner Agnostic Approach

Can a student Large Language Model perform as well as it's teacher?

Reinforced Multi-Teacher Selection for Knowledge Distillation

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Knowledge distillation based on projector integration and classifier sharing

Knowledge Distillation with a Precise Teacher and Prediction with Abstention

Improved Knowledge Distillation via Teacher Assistant

Deeply-Supervised Knowledge Distillation

Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification

Improved Knowledge Distillation via Adversarial Collaboration

Subclass Knowledge Distillation with Known Subclass Labels

A Two-Teacher Framework For Knowledge Distillation

Boosting Knowledge Distillation Via Intra-class Logit Distribution Smoothing

Generalized Knowledge Distillation via Relationship Matching

Collaborative Knowledge Distillation

Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation