Acorns: A Framework for Accelerating Deep Neural Networks with Input Sparsity

Xiao Dong,Lei Liu,Peng Zhao,Guangli Li,Jiansong Li,Xueying Wang,Xiaobing Feng
DOI: https://doi.org/10.1109/PACT.2019.00022
2019-01-01
Abstract:Deep neural networks have been employed in a broad range of applications, including face detection, natural language processing, and autonomous driving. Yet, the neural networks with the capability to tackle real-world problems are intrinsically expensive in computation, hindering the usage of these models. Sparsity in the input data of neural networks provides an optimizing opportunity. However, harnessing the potential performance improvement on modern CPU faces challenges raised by sparse computations of the neural network, such as cache-unfriendly memory accesses and efficient sparse kernel implementation. In this paper, we propose Acorns, a framework to accelerate deep neural networks with input sparsity. In Acorns, sparse input data is organized into our designed sparse data layout, which allows memory-friendly access for kernels in neural networks and opens the door for many performance-critical optimizations. Upon that, Acorns generates efficient sparse kernels for operators in neural networks from kernel templates, which combine directions that express specific optimizing transformations to be performed, and straightforward code that describes the computation. Comprehensive evaluations demonstrate Acorns can outperform state-of-the-art baselines by significant speedups. On the real-world detection task in autonomous driving, Acorns demonstrates 1.8-22.6× performance improvement over baselines. Specifically, the generated programs achieve 1.8-2.4× speedups over Intel MKL-DNN, 3.0-8.8× speedups over TensorFlow, and 11.1-13.2× speedups over Intel MKL-Sparse.
What problem does this paper attempt to address?