Understanding Data Partition for Applications on CPU-GPU Integrated Processors.

Juan Fang,Huanhuan Chen,Junjie Mao
DOI: https://doi.org/10.1007/978-981-10-8890-2_32
2017-01-01
Abstract:Integrating GPU with CPU on the same chip is increasingly common in current processor architectures for high performance. CPU and GPU share on-chip network, last level cache, memory. Do not need to copy data back and forth that a discrete GPU requires. Shared virtual memory, memory coherence, and system-wide atomics are introduced to heterogeneous architectures and programming models to enable fine-grained CPU and GPU collaboration. Programming model such as OpenCL 2.0, CUDA 8.0, and C++ AMP support these heterogeneous architecture features. Data partition is one of the collaboration patterns. It is essential for improving performance and energy-efficiency to balance the data processed between CPU and GPU. In this paper, we first demonstrate that the optimal allocation of data to the CPU and GPU can provide 20% higher performance than fixed ratio of 20% for one application. Second, we evaluate another 5 heterogeneous applications covering the latest architecture features, found the relation of the data partitioning with performance.
What problem does this paper attempt to address?