Abstract:Self-distillation relies on its own information to improve the generalization ability of the model and has a bright future. Existing self-distillation methods either require additional models, model modification, or batch size expansion for training, which increases the difficulty of use, memory consumption, and computational cost. This paper developed Self-discipline on multiple channels(SMC), which combines consistency regularization with self-distillation using the concept of multiple channels. Conceptually, SMC consists of two steps: 1) each channel data is simultaneously passed through the model to obtain its corresponding soft label, and 2) the soft label saved in the previous step is read together with the soft label obtained from the current channel data through the model to calculate the loss function. SMC uses consistent regularization and self-distillation to improve the generalization ability of the model and the robustness of the model to noisy labels. We named the SMC containing only two channels as SMC-2. Comparative experimental results on both datasets show that SMC-2 outperforms Label Smoothing Regularizaion and Self-distillation From The Last Mini-batch on all models, and outperforms the state-of-the-art Sharpness-Aware Minimization method on 83% of the models.Compatibility of SMC-2 and data augmentation experimental results show that using both SMC-2 and data augmentation improves the generalization ability of the model between 0.28% and 1.80% compared to using only data augmentation. Ultimately, the results of the label noise interference experiments show that SMC-2 curbs the tendency that the model's generalization ability decreases in the late training period due to the interference of label noise. The code is available at <a class="link-external link-https" href="https://github.com/JiuTiannn/SMC-Self-discipline-on-multiple-channels" rel="external noopener nofollow">this https URL</a>.

Stochastic Ghost Batch for Self-distillation with Dynamic Soft Label

Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft Label Regularizer

Tolerant Self-Distillation for Image Classification

Self-Distillation from the Last Mini-Batch for Consistency Regularization

Gaussian Mixture Model and Double-Weighted Deep Neural Networks for Data Augmentation Soft Sensing

Data-Distortion Guided Self-Distillation for Deep Neural Networks

Self-Distillation as Instance-Specific Label Smoothing

Self Supervision to Distillation for Long-Tailed Visual Recognition

Dynamic Auxiliary Soft Labels for Decoupled Learning

Self-Distillation for Randomized Neural Networks

Extending Label Smoothing Regularization with Self-Knowledge Distillation

Self-discipline on multiple channels

DynamicAug: Enhancing Transfer Learning Through Dynamic Data Augmentation Strategies Based on Model State

Label Augmentation for Neural Networks Robustness

Generative Denoise Distillation: Simple Stochastic Noises Induce Efficient Knowledge Transfer for Dense Prediction

Bag of Instances Aggregation Boosts Self-supervised Distillation

Toward Understanding Generative Data Augmentation

Adaptive Regularization of Labels

Self-knowledge distillation via dropout

Self-Knowledge Distillation via Progressive Associative Learning