Abstract:As a specific form of knowledge distillation (KD), self-knowledge distillation enables a student network to progressively distill its own knowledge without relying on a pretrained, complex teacher network; however, recent studies of self-KD have discovered that additional dark knowledge captured by auxiliary architecture or data augmentation could create better soft targets for enhancing the network but at the cost of significantly more computations and/or parameters. Moreover, most existing self-KD methods extract the soft label as a supervisory signal from individual input samples, which overlooks the knowledge of relationships among categories. Inspired by human associative learning, we propose a simple yet effective self-KD method named associative learning for self-distillation (ALSD), which progressively distills richer knowledge regarding the relationships between categories across independent samples. Specifically, in the process of distillation, the propagation of knowledge is weighted based on the intersample relationship between associated samples generated in different minibatches, which are progressively estimated with the current network. In this way, our ALSD framework achieves knowledge ensembling progressively across multiple samples using a single network, resulting in minimal computational and memory overhead compared to existing ensembling methods. Extensive experiments demonstrate that our ALSD method consistently boosts the classification performance of various architectures on multiple datasets. Notably, ALSD pushes forward the self-KD performance to 80.10% on CIFAR-100, which exceeds the standard backpropagation by 4.81%. Furthermore, we observe that the proposed method shows comparable performance with the state-of-the-art knowledge distillation methods without the pretrained teacher network.

Skill Enhancement Learning with Knowledge Distillation

Skill-transferring Knowledge Distillation Method

Diffskill: Improving Reinforcement Learning Through Diffusion-Based Skill Denoiser for Robotic Manipulation

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

Multi-target Knowledge Distillation Via Student Self-reflection

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Leveraging Knowledge Distillation for Efficient Deep Reinforcement Learning in Resource-Constrained Environments

Extending Label Smoothing Regularization with Self-Knowledge Distillation

Self-Knowledge Distillation via Progressive Associative Learning

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Refine to the essence: Less-redundant skill learning via diversity clustering

Skill-Enhanced Reinforcement Learning Acceleration from Demonstrations

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Skill Disentanglement for Imitation Learning from Suboptimal Demonstrations

Goal-Conditioned Q-Learning as Knowledge Distillation

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

SLIM: Skill Learning with Multiple Critics

An Embarrassingly Simple Approach for Knowledge Distillation

Revisiting Knowledge Distillation: an Inheritance and Exploration Framework

Improving Knowledge Distillation via Transferring Learning Ability

Learning to Teach with Student Feedback