Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems

Junli Gu,Maohua Zhu,Zhitao Zhou,Feng Zhang,Zhen Lin,Qianfeng Zhang,Mauricio Breternitz
DOI: https://doi.org/10.1145/2637166.2637229
2014-01-01
Abstract:Deep Neural Networks (DNN), with deep layers and very high dimension of parameters, have demonstrated break-through learning capability in machine learning area. These days DNN with Big Data input are leading a new direction in large scale object recognition. DNN training requires vast amount of computing power, which poses great challenge to system design. DNN training embraces massive thread and data parallelism, which matches naturally with GPU. There are various heterogeneous systems including discrete CPU armed with GPUs and chip level integrated CPU+GPU heterogeneous processors-named APUs. In this paper, we explore the implementation of DNN models on different heterogenous platforms to provide systematic evaluation and comparison. Specifically we implement two well-known DNN kernels Multi-Layer Perceptron (MLP) and Autoencoder on various GPUs and APUs from mainstream processor manufacturers. Evaluations results show GPUs are faster than APUs but at the cost of burning much more power. APUs achieve upto 2x higher performance per watt efficiency, which indicates that APU server can be an energy efficient and high density solution to accelerate DNN applications. This paper also conducts bottleneck analysis and presents the optimized techniques on various platforms.
What problem does this paper attempt to address?