A Comprehensive Analysis of Low-Impact Computations in Deep Learning Workloads

Hengyi Li,Zhichen Wang,Xuebin Yue,Wenwen Wang,Hiroyuki Tomiyama,Lin Meng
DOI: https://doi.org/10.1145/3453688.3461747
2021-01-01
Abstract:Deep Neural Networks (DNNs) have achieved great successes in various machine learning tasks involving a wide range of domains. Though there are multiple hardware platforms available, such as GPUs, CPUs, FPGAs, and etc, CPUs are still preferred choices for machine learning applications, especially in low-power and resource-constrained computation environments such as embedded systems. However, the power and performance efficiency become critical issues in such computation environments when applying DNN techniques. An attractive optimization to DNNs is to remove redundant computations to enhance the execution efficiency. To this end, this paper conducts extensive experiments and analyses on popular state-of-the-art deep learning models. The experimental results include the numbers of instructions, branches, branch prediction misses, cache misses, and etc, during the execution of the models. Besides, we also investigate the performance and sparsity of each layer in the models. Based on the analysis results, this paper also proposes an instruction-level optimization, which achieves the performance improvement ranging from 10.26% to 28.0% for certain convolution layers.
What problem does this paper attempt to address?