Enabling High Performance Deep Learning Networks on Embedded Systems

Qian Li,Qingcheng Xiao,Yun Liang
DOI: https://doi.org/10.1109/iecon.2017.8217476
2017-01-01
Abstract:Deep learning is nowadays one of the most popular research topics in computer science. In recent years, the extensive application of convolutional neural network has made it become a new direction for the computer architecture research that is developing rapidly. Currently, there is a growing demand on off-line deploying deep learning network on top of embedded mobile systems. However, how to balance the limited computing and storage resources on embedded platforms, and the huge storage requirements with the increase of network complexity, has become the core problem of current research. In this paper, we explore the optimization technology to enable high-performance deep learning network for embedded systems from two aspects: the neural network design and the acceleration on embedded platforms. We focus on convolutional neural networks. First, we combine several technologies and propose a set of pruning mechanisms to save storage resources. We also explore the concept of block wise sparsity. Second, from the perspective of mobile deployment, we propose a method to automatically select the optimal convolution/matrix multiplication approach based on the sparsity of the matrix and its sparse structure. Our experiments on NVIDIA TX1 show that our approach can be used together to promote each other and achieve the goal of improving the performance of computation while reducing the storage consumption.
What problem does this paper attempt to address?