Parallel Data Mining on CUDA-enabled Graphics Processing Unit(GPU)

Liu Ying,Jian Liheng,Liang Shenshen,Li Xiaojun,Gao Yang,Wang Cheng
2010-01-01
Abstract:Data mining is a widely used technique and has significant applications in various domains. However, current data mining toolkits cannot meet the requirement of applications with large-scale databases in terms of speed. Therefore, parallel data mining technique is in demand. Recent development in Graphics Processing Units(GPUs) has enabled inexpensive high performance computing for general- purpose applications. Compute Unified Device Architecture(CUDA) programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. In this paper, we introduce GUCAS-Miner, a parallel data miner on GPU+CPU heterogeneous parallel architecture using CUDA. We present the implementation of three representative data mining algorithms, including CU-Apriori, CU-KNN and CU-K-means. GPU kernels are launched to parallelize the computational intensive portions of the serial code of each algorithm. Several optimization techniques are applied to maximize the concurrency and bandwidth. Its parallel implementation outperforms the corresponding state-of-the-art CPU implementations significantly on different datasets on HP xw8600 workstation with a Tesla C1060 GPU and a quad-core Intel CPU. Our results have shown that CUDA enabled-GPU + CPU parallel architecture is feasible and promising for data mining applications.
What problem does this paper attempt to address?