A Computing Efficient Hardware Architecture for Sparse Deep Neural Network Computing

Yanwen Zhang,Peng Ouyang,Shouyi Yin,Youguang Zhang,Weisheng Zhao,Shaojun Wei
DOI: https://doi.org/10.1109/icsict.2018.8565755
2018-01-01
Abstract:Convolutional Neural Networks (CNNs) have demonstrated significant performance in AI (artificial intelligence) systems. However, CNNs often have tens or even hundreds of neural layers with millions of parameters to achieve state-of-the-art performance, which hinders the deployment to some resource limited scenarios. Meanwhile, those parameters and data usually are sparse, which results in useless calculation as well as unbalanced calculation. To solve these problem, we propose a computing efficient hardware architecture. In order to decrease calculating redundancy, we filter zero-valued weights and zero-valued feature maps. To reduce redundant memory consumption, we propose a memory division and a data reuse mechanism. To resolve load imbalance, we implement a near-zero-cost scheduling switching strategy. Experimental results show that our architecture saves, on average, 22.6% memory times and 60.5% computing time over the state-of-the-art NN accelerator.
What problem does this paper attempt to address?