Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

Gokul Krishnan,Yufei Ma,Yu Cao
DOI: https://doi.org/10.1109/ICSICT49897.2020.9278024
2020-01-01
Abstract:DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-the-art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.
What problem does this paper attempt to address?