Abstract:Sparse training is one of the promising techniques to reduce the computational cost of DNNs while retaining high accuracy. In particular, N:M fine-grained structured sparsity, where only N out of consecutive M elements can be nonzero, has attracted attention due to its hardware-friendly pattern and capability of achieving a high sparse ratio. However, the potential to accelerate N:M sparse DNN training has not been fully exploited, and there is a lack of efficient hardware supporting N:M sparse training. To tackle these challenges, this paper presents a computation-efficient training scheme for N:M sparse DNNs using algorithm, architecture, and dataflow co-design. At the algorithm level, a bidirectional weight pruning method, dubbed BDWP, is proposed to leverage the N:M sparsity of weights during both forward and backward passes of DNN training, which can significantly reduce the computational cost while maintaining model accuracy. At the architecture level, a sparse accelerator for DNN training, namely SAT, is developed to neatly support both the regular dense operations and the computation-efficient N:M sparse operations. At the dataflow level, multiple optimization methods ranging from interleave mapping, pre-generation of N:M sparse weights, and offline scheduling, are proposed to boost the computational efficiency of SAT. Finally, the effectiveness of our training scheme is evaluated on a Xilinx VCU1525 FPGA card using various DNN models and datasets. Experimental results show the SAT accelerator with the BDWP sparse training method under 2:8 sparse ratio achieves an average speedup of 1.75x over that with the dense training, accompanied by a negligible accuracy loss of 0.56% on average. Furthermore, our proposed training scheme significantly improves the training throughput by 2.97~25.22x and the energy efficiency by 1.36~3.58x over prior FPGA-based accelerators.

Efficient Neural Network Training Via Forward and Backward Propagation Sparsification

Deep Neural Network Acceleration with Sparse Prediction Layers

Memorized Sparse Backpropagation.

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Training Simplification and Model Simplification for Deep Learning : A Minimal Effort Back Propagation Method

SparseTrain: Exploiting Dataflow Sparsity for Efficient Convolutional Neural Networks Training

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

Always-Sparse Training by Growing Connections with Guided Stochastic Exploration

Meprop: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting

Take A Shortcut Back: Mitigating the Gradient Vanishing for Training Spiking Neural Networks

ssProp: Energy-Efficient Training for Convolutional Neural Networks with Scheduled Sparse Back Propagation

Accelerating CNN Training by Pruning Activation Gradients

Sparse optimization guided pruning for neural networks

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning

Neural Network Compression Via Sparse Optimization

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

Compressing Deep Neural Networks With Sparse Matrix Factorization

Fast Sparse Deep Neural Networks: Theory and Performance Analysis

Differentiable Sparsification for Deep Neural Networks

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization