Optimizing HPL Benchmark on Multi-GPU Clusters

CHEN Ren-zhi,HUANG Li-bo,CHEN Xu-hao,WANG Zhi-ying
DOI: https://doi.org/10.3969/j.issn.1002-137x.2013.03.023
2013-01-01
Computer Science
Abstract:The current high performance Linpack benchmark accelerated by GPU usually employs the performance-based dynamic load balancing algorism.However,this algorism does not perform well on multi-GPU cluster.The reason is that each GPU on this kind of node has a smaller calculating scale,and the gap of the total performance between GPU and CPU is larger.Therefore,this article proposed an experience-based dynamic load balancing algorism and a multi-GPU adaptive load balance algorithm.Besides,the article tested these two algorisms on multi-GPU cluster and it achieves a 6.3% acceleration compared with the latest NVIDIA's HPL accelerated by GPU.
What problem does this paper attempt to address?