Abstract:The rapid evolution of deep learning has led to significant achievements in computer vision, primarily driven by complex convolutional neural networks (CNNs). However, the increasing depth and parameter count of these networks often result in overfitting and elevated computational demands. Knowledge distillation (KD) has emerged as a promising technique to address these issues by transferring knowledge from a large, well-trained teacher model to a more compact student model. This paper introduces a novel knowledge distillation method that simplifies the distillation process and narrows the performance gap between teacher and student models without relying on intricate knowledge representations. Our approach leverages a unique teacher network architecture designed to enhance the efficiency and effectiveness of knowledge transfer. Additionally, we introduce a streamlined teacher network architecture that transfers knowledge effectively through a simplified distillation process, enabling the student model to achieve high accuracy with reduced computational demands. Comprehensive experiments conducted on the CIFAR-10 dataset demonstrate that our proposed model achieves superior performance compared to traditional KD methods and established architectures such as ResNet and VGG networks. The proposed method not only maintains high accuracy but also significantly reduces training and validation losses. Key findings highlight the optimal hyperparameter settings (temperature T = 15.0 and smoothing factor α = 0.7), which yield the highest validation accuracy and lowest loss values. This research contributes to the theoretical and practical advancements in knowledge distillation, providing a robust framework for future applications and research in neural network compression and optimization. The simplicity and efficiency of our approach pave the way for more accessible and scalable solutions in deep learning model deployment.

Layer-by-Layer Knowledge Distillation for Training Simplified Bipolar Morphological Neural Networks

DCCD: Reducing Neural Network Redundancy Via Distillation

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

A foundation for exact binarized morphological neural networks

Biologically Inspired Structure Learning with Reverse Knowledge Distillation for Spiking Neural Networks

Accuracy Versus Simplification in an Approximate Logic Neural Model

Lipschitz Continuity Guided Knowledge Distillation

BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

An adiabatic method to train binarized artificial neural networks

Simplified Knowledge Distillation for Deep Neural Networks Bridging the Performance Gap with a Novel Teacher–Student Architecture

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

SBNN: Slimming binarized neural network

Self-knowledge distillation enhanced binary neural networks derived from underutilized information

Training Simplification and Model Simplification for Deep Learning : A Minimal Effort Back Propagation Method

An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation

Efficient Vectorized Backpropagation Algorithms for Training Feedforward Networks Composed of Quadratic Neurons

Efficient Biomedical Instance Segmentation via Knowledge Distillation

Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry

Dual-mode Dendritic Devices Enhanced Neural Network Based on Electrolyte Gated Transistors

Memristive KDG-BNN: Memristive binary neural networks trained via knowledge distillation and generative adversarial networks

Decoupled graph knowledge distillation: A general logits-based method for learning MLPs on graphs