Abstract:The rapid evolution of deep learning has led to significant achievements in computer vision, primarily driven by complex convolutional neural networks (CNNs). However, the increasing depth and parameter count of these networks often result in overfitting and elevated computational demands. Knowledge distillation (KD) has emerged as a promising technique to address these issues by transferring knowledge from a large, well-trained teacher model to a more compact student model. This paper introduces a novel knowledge distillation method that simplifies the distillation process and narrows the performance gap between teacher and student models without relying on intricate knowledge representations. Our approach leverages a unique teacher network architecture designed to enhance the efficiency and effectiveness of knowledge transfer. Additionally, we introduce a streamlined teacher network architecture that transfers knowledge effectively through a simplified distillation process, enabling the student model to achieve high accuracy with reduced computational demands. Comprehensive experiments conducted on the CIFAR-10 dataset demonstrate that our proposed model achieves superior performance compared to traditional KD methods and established architectures such as ResNet and VGG networks. The proposed method not only maintains high accuracy but also significantly reduces training and validation losses. Key findings highlight the optimal hyperparameter settings (temperature T = 15.0 and smoothing factor α = 0.7), which yield the highest validation accuracy and lowest loss values. This research contributes to the theoretical and practical advancements in knowledge distillation, providing a robust framework for future applications and research in neural network compression and optimization. The simplicity and efficiency of our approach pave the way for more accessible and scalable solutions in deep learning model deployment.

Explore a Novel Knowledge Distillation Framework for Network Learning and Low-Bit Quantization

DCCD: Reducing Neural Network Redundancy Via Distillation

Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Divide and Conquer: Leveraging Intermediate Feature Representations for Quantized Training of Neural Networks

Data-Free Low-Bit Quantization Via Dynamic Multi-teacher Knowledge Distillation.

Quantized Feature Distillation for Network Quantization

Quantifying the Knowledge in a DNN to Explain Knowledge Distillation for Classification

Towards Low-Bit Quantization of Deep Neural Networks with Limited Data.

ResKD: Residual-Guided Knowledge Distillation

Deep Transferring Quantization

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Quantization Networks

Using Distillation to Improve Network Performance after Pruning and Quantization

Simplified Knowledge Distillation for Deep Neural Networks Bridging the Performance Gap with a Novel Teacher–Student Architecture

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

Knowledge Distillation Based on Narrow-Deep Networks

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

Iterative Deep Neural Network Quantization with Lipschitz Constraint

Self-Distillation: Towards Efficient and Compact Neural Networks