iMLBench: A Machine Learning Benchmark Suite for CPU-GPU Integrated Architectures
Chenyang Zhang,Feng Zhang,Xiaoguang Guo,Bingsheng He,Xiao Zhang,Xiaoyong Du
DOI: https://doi.org/10.1109/tpds.2020.3046870
IF: 5.3
2021-07-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:Utilizing heterogeneous accelerators, especially GPUs, to accelerate machine learning tasks has shown to be a great success in recent years. GPUs bring huge performance improvements to machine learning and greatly promote the widespread adoption of machine learning. However, the discrete CPU-GPU architecture design with high PCIe transmission overhead decreases the GPU computing benefits in machine learning training tasks. To overcome such limitations, hardware vendors release CPU-GPU integrated architectures with shared unified memory. In this article, we design a benchmark suite for machine learning training on CPU-GPU integrated architectures, called iMLBench, covering a wide range of machine learning applications and kernels. We mainly explore two features on integrated architectures: 1) zero-copy, which means that the PCIe overhead has been eliminated for machine learning tasks and 2) co-running, which means that the CPU and the GPU co-run together to process a single machine learning task. Our experimental results on iMLBench show that the integrated architecture brings an average 7.1× performance improvement over the original implementations. Specifically, the zero-copy design brings 4.65× performance improvement, and co-running brings 1.78× improvement. Moreover, integrated architectures exhibit promising results from both performance-per-dollar and energy perspectives, achieving 6.50× performance-price ratio while 4.06× energy efficiency over discrete GPUs. The benchmark is open-sourced at https://github.com/ChenyangZhang-cs/iMLBench.
computer science, theory & methods,engineering, electrical & electronic