Crane: Mitigating Accelerator Under-utilization Caused by Sparsity Irregularities in CNNs
Yijin Guan,Guangyu Sun,Zhihang Yuan,Xingchen Li,Ningyi Xu,Shu Chen,Jason Cong,Yuan Xie
DOI: https://doi.org/10.1109/TC.2020.2981080
IF: 3.183
2020-01-01
IEEE Transactions on Computers
Abstract:Convolutional neural networks (CNNs) have achieved great success in numerous AI applications. To improve inference efficiency of CNNs, researchers have proposed various pruning techniques to reduce both computation intensity and storage overhead. These pruning techniques result in multi-level sparsity irregularities in CNNs. Together with that in activation matrices, which is induced by employment of ReLU activation function, all these sparsity irregularities cause a serious problem of computation resource under-utilization in sparse CNN accelerators. To mitigate this problem, we propose a method of load-balancing based on a workload stealing technique. We demonstrate that this method can be applied to two major inference data-flows, which cover all state-of-the-art sparse CNN accelerators. Based on this method, we present an accelerator, called Crane, which addresses all kinds of sparsity irregularities in CNNs. We perform a fair comparison between Crane and state-of-the-art prior approaches. Experimental results show that Crane improves performance by 27% similar to 88% and reduces energy consumption by 16% similar to 48%, respectively, compared to the counterparts.