Abstract:The rapid development of neural networks has come at the cost of increased computational complexity. Neural networks are both computationally intensive and memory intensive; as such, the minimal energy and computing power of satellites pose a challenge for automatic target recognition (ATR). Knowledge distillation (KD) can distill knowledge from a cumbersome teacher network to a lightweight student network, transferring the essential information learned by the teacher network. Thus, the concept of KD can be used to improve the accuracy of student networks. Even when learning from a teacher network, there is still redundancy in the student network. Traditional networks fix the structure before training, such that training does not improve the situation. This paper proposes a distillation sparsity training (DST) algorithm based on KD and network pruning to address the above limitations. We first improve the accuracy of the student network through KD, and then through network pruning, allowing the student network to learn which connections are essential. DST allows the teacher network to teach the pruned student network directly. The proposed algorithm was tested on the CIFAR-100, MSTAR, and FUSAR-Ship data sets, with a 50% sparsity setting. First, a new loss function for the teacher-pruned student was proposed, and the pruned student network showed a performance close to that of the teacher network. Second, a new sparsity model (uniformity half-pruning UHP) was designed to solve the problem that unstructured pruning does not facilitate the implementation of general-purpose hardware acceleration and storage. Compared with traditional unstructured pruning, UHP can double the speed of neural networks.

DASNet: Dynamic Activation Sparsity for Neural Network Efficiency Improvement

Class-Aware Pruning for Efficient Neural Networks

BitSNNs: Revisiting Energy-efficient Spiking Neural Networks

Neurogenesis Dynamics-inspired Spiking Neural Network Training Acceleration

DTS: Dynamic Training Slimming with Feature Sparsity for Efficient Convolutional Neural Network

PowerPruning: Selecting Weights and Activations for Power-Efficient Neural Network Acceleration

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

Dynamic Sparse Graph for Efficient Deep Learning.

DAS: Neural Architecture Search via Distinguishing Activation Score

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

Neural Dynamics Pruning for Energy-Efficient Spiking Neural Networks

Dynamic and Adaptive Threshold for DNN Compression from Scratch.

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

A systematic DNN weight pruning framework based on symmetric accelerated stochastic ADMM

DANNA: A Dimension-Aware Neural Network Accelerator for Unstructured Sparsity

Distillation Sparsity Training Algorithm for Accelerating Convolutional Neural Networks in Embedded Systems

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

AntiDote: Attention-based Dynamic Optimization for Neural Network Runtime Efficiency

Dual sparse training framework: inducing activation map sparsity via Transformed $\ell1$ regularization