Abstract:The rapid evolution of deep learning has led to significant achievements in computer vision, primarily driven by complex convolutional neural networks (CNNs). However, the increasing depth and parameter count of these networks often result in overfitting and elevated computational demands. Knowledge distillation (KD) has emerged as a promising technique to address these issues by transferring knowledge from a large, well-trained teacher model to a more compact student model. This paper introduces a novel knowledge distillation method that simplifies the distillation process and narrows the performance gap between teacher and student models without relying on intricate knowledge representations. Our approach leverages a unique teacher network architecture designed to enhance the efficiency and effectiveness of knowledge transfer. Additionally, we introduce a streamlined teacher network architecture that transfers knowledge effectively through a simplified distillation process, enabling the student model to achieve high accuracy with reduced computational demands. Comprehensive experiments conducted on the CIFAR-10 dataset demonstrate that our proposed model achieves superior performance compared to traditional KD methods and established architectures such as ResNet and VGG networks. The proposed method not only maintains high accuracy but also significantly reduces training and validation losses. Key findings highlight the optimal hyperparameter settings (temperature T = 15.0 and smoothing factor α = 0.7), which yield the highest validation accuracy and lowest loss values. This research contributes to the theoretical and practical advancements in knowledge distillation, providing a robust framework for future applications and research in neural network compression and optimization. The simplicity and efficiency of our approach pave the way for more accessible and scalable solutions in deep learning model deployment.

Like Teacher, Like Pupil: Transferring Backdoors Via Feature-Based Knowledge Distillation

Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model

Anti-Distillation Backdoor Attacks: Backdoors Can Really Survive in Knowledge Distillation

Private Knowledge Transfer via Model Distillation with Generative Adversarial Networks

Transferring Backdoors between Large Language Models by Knowledge Distillation

Revisiting Data-Free Knowledge Distillation with Poisoned Teachers

Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Safe Distillation Box

A Practical Trigger-Free Backdoor Attack on Neural Networks

Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

Backdoor Defense via Decoupling the Training Process

Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks

Reverse Backdoor Distillation: Towards Online Backdoor Attack Detection for Deep Neural Network Models

Model Mimic Attack: Knowledge Distillation for Provably Transferable Adversarial Examples

Simplified Knowledge Distillation for Deep Neural Networks Bridging the Performance Gap with a Novel Teacher–Student Architecture

An Embarrassingly Simple Approach for Knowledge Distillation

NBA: defensive distillation for backdoor removal via neural behavior alignment

Undistillable: Making A Nasty Teacher That CANNOT teach students

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Distilling the Undistillable: Learning from a Nasty Teacher

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection