Abstract:Knowledge distillation (KD) is a prevalent model compression technique in deep learning, aiming to leverage knowledge from a large teacher model to enhance the training of a smaller student model. It has found success in deploying compact deep models in intelligent applications like intelligent transportation, smart health, and distributed intelligence. Current knowledge distillation methods primarily fall into two categories: offline and online knowledge distillation. Offline methods involve a one-way distillation process, transferring unvaried knowledge from teacher to student, while online methods enable the simultaneous training of multiple peer students. However, existing knowledge distillation methods often face challenges where the student may not fully comprehend the teacher's knowledge due to model capacity gaps, and there might be knowledge incongruence among outputs of multiple students without teacher guidance. To address these issues, we propose a novel reciprocal teacher-student learning inspired by human teaching and examining through forward and feedback knowledge distillation (FFKD). Forward knowledge distillation operates offline, while feedback knowledge distillation follows an online scheme. The rationale is that feedback knowledge distillation enables the pre-trained teacher model to receive feedback from students, allowing the teacher to refine its teaching strategies accordingly. To achieve this, we introduce a new weighting constraint to gauge the extent of students' understanding of the teacher's knowledge, which is then utilized to enhance teaching strategies. Experimental results on five visual recognition datasets demonstrate that the proposed FFKD outperforms current state-of-the-art knowledge distillation methods.

A Two-Teacher Framework For Knowledge Distillation

Collaborative Knowledge Distillation

A Novel Framework for Online Knowledge Distillation

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation

Improving knowledge distillation via an expressive teacher

TC<SUP>3</SUP>KD: Knowledge distillation via teacher-student cooperative curriculum customization

Teacher-student collaborative knowledge distillation for image classification

What Knowledge Gets Distilled in Knowledge Distillation?

Improving Knowledge Distillation With a Customized Teacher

A Unified Asymmetric Knowledge Distillation Framework for Image Classification

Exploring the Knowledge Transferred by Response-Based Teacher-Student Distillation

Attribute Structured Knowledge Distillation

Learning Student-Friendly Teacher Networks for Knowledge Distillation

Highlight Every Step: Knowledge Distillation via Collaborative Teaching

An Embarrassingly Simple Approach for Knowledge Distillation

Knowledge distillation based on multi-layer fusion features

Reciprocal Teacher-Student Learning Via Forward and Feedback Knowledge Distillation

Dual teachers for self-knowledge distillation

Feature Fusion-Based Collaborative Learning for Knowledge Distillation.

Skill-transferring Knowledge Distillation Method