SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA

Yufei Ma,Gokul Krishnan,Yu Cao,Le Ye,Ru Huang
DOI: https://doi.org/10.1145/3431920.3439465
2021-01-01
Abstract:ABSTRACTState-of-the-art DNN pruning approaches achieved high sparsity. However, these methods usually do not consider the intrinsic graph property of DNNs, leading to an irregular pruned network. Consequently, hardware accelerators cannot directly benefit from such pruning, suffering additional cost on indexing, control and data paths. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique, SWIFT, that integrates local clusters and global sparsity in DNNs to benefit the dataflow and workload balance of the accelerators. In particular, we propose an output stationary FPGA architecture to accelerate DNN inference and integrate it with the structural sparsity by SWIFT, so that the communication and computation of clustered zero weights are eliminated. In addition, a full mesh data router is designed to adaptively direct inputs into corresponding processing elements (PEs) for different layer configurations and skipping zero operations. The proposed SWIFT is evaluated with multiple DNNs on different datasets. It achieves sparsity ratio up to 76% for CIFAR-10, 83% for CIFAR-100, 76% for the SVHN datasets. Moreover, our proposed SWIFT FPGA accelerator achieves up to 4.4× improvement in throughput for different dense networks with a marginal hardware overhead.
What problem does this paper attempt to address?