Abstract:To accelerate the practical applications of artificial intelligence, this paper proposes a high efficient layer-wise refined pruning method for deep neural networks at the software level and accelerates the inference process at the hardware level on a field-programmable gate array (FPGA). The refined pruning operation is based on the channel-wise importance indexes of each layer and the layer-wise input sparsity of convolutional layers. The method utilizes the characteristics of the native networks without introducing any extra workloads to the training phase. In addition, the operation is easy to be extended to various state-of-the-art deep neural networks. The effectiveness of the method is verified on ResNet architecture and VGG networks in terms of dataset CIFAR10, CIFAR100, and ImageNet100. Experimental results show that in terms of ResNet50 on CIFAR10 and ResNet101 on CIFAR100, more than 85% of parameters and Floating-Point Operations are pruned with only 0.35% and 0.40% accuracy loss, respectively. As for the VGG network, 87.05% of parameters and 75.78% of Floating-Point Operations are pruned with only 0.74% accuracy loss for VGG13BN on CIFAR10. Furthermore, we accelerate the networks at the hardware level on the FPGA platform by utilizing the tool Vitis AI. For two threads mode in FPGA, the throughput/fps of the pruned VGG13BN and ResNet101 achieves 151.99 fps and 124.31 fps, respectively, and the pruned networks achieve about 4.3 × and 1.8 × speed up for VGG13BN and ResNet101, respectively, compared with the original networks on FPGA.

Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA

Regularization-Free Structural Pruning for GPU Inference Acceleration

Class-Aware Pruning for Efficient Neural Networks

Structural Pruning in Deep Neural Networks: A Small-World Approach

Efficient Structure Slimming for Spiking Neural Networks

Single-shot Pruning and Quantization for Hardware-Friendly Neural Network Acceleration

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA

Structured Term Pruning for Computational Efficient Neural Networks Inference

Intermediate-grained kernel elements pruning with structured sparsity

An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference.

PruneAug: Bridging DNN Pruning and Inference Latency on Diverse Sparse Platforms Using Automatic Layerwise Block Pruning

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

Adaptive Activation-based Structured Pruning

Highly Efficient Sparse Neural Network Computing - Hardware and Software Solutions.

Connection Pruning for Deep Spiking Neural Networks with On-Chip Learning

One-Cycle Pruning: Pruning ConvNets Under a Tight Training Budget

Work-in-Progress: A High-performance FPGA Accelerator for Sparse Neural Networks

Balanced Sparsity for Efficient DNN Inference on GPU

An efficient pruning and fine-tuning method for deep spiking neural network

A Dynamic Pruning Method on Multiple Sparse Structures in Deep Neural Networks