GPU computing using concurrent kernels: A case study

Fengshun Lu,Junqiang Song,Fukang Yin,Xiaoqian Zhu
DOI: https://doi.org/10.1007/978-3-642-25766-7-23
2012-01-01
Abstract:With the rapid evolution of processor architectures, more attention has been paid to the hardware-oriented numeric applications. Based on the newly released Fermi architecture, we investigate the approach to accelerate high performance computing (HPC) applications with concurrent kernels. We concentrated on two performance factors, namely the launching order of concurrent kernels and the kernel granularity. Extensive experiments show that the launching order of concurrent kernels can hardly affect application performance. Particularly, we identify the heuristics of kernel granularity that may result in the best performance, i.e. the occupancy of each kernel should be in the interval [30%, 50%]. © Springer-Verlag Berlin Heidelberg 2012.
What problem does this paper attempt to address?