Efficient Statistical Computing on Multicore and MultiGPU Systems

Yulong Ou,Bo Li,Hailong Yang,Zhongzhi Luan,Depei Qian
DOI: https://doi.org/10.1109/nbis.2012.89
2012-01-01
Abstract:As a statistical programming language for data analysis with a powerful graphics toolkit, R has been widely used in mathematical computing, biology simulation and medicine research. For large-scale computing such as drug discovery and protein folding, R is not good enough since it usually runs on a desktop computer. The situation gets worse when R runs on a single machine, while other computing is done on a cluster or even a supercomputer. In this paper, a parallel computing schema was proposed that R running on both CPU and GPU clusters, which have shown high multi-threaded performance while enabling high parallelism with lower energy consuming. The three statistical algorithms: chi-squared distribution, Pearson correlation coefficient and unary linear regression model were rewritten. Evaluation shows that our implementation exhibits superior performance and energy-efficiency than the single-threaded competitors. For instance, when the size of input dataset reaches 400M, the MPI implementation of the chi-squared distribution on a cluster with four nodes achieves a speedup of nearly 20x, while the CUDA implementation achieves a speedup of 5.2x on a single-GPU, and more than 15x on a system with three GPUs.
What problem does this paper attempt to address?