Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism.

Dong Jianqiang,Wang Fei,Yuan Bo
DOI: https://doi.org/10.1007/978-3-642-41278-3_50
2013-01-01
Abstract:In this big data era, the capability of mining and analyzing large scale datasets is imperative. As data are becoming more abundant than ever before, data driven methods are playing a critical role in areas such as decision support and business intelligence. In this paper, we demonstrate how state-of-the-art GPUs and the Dynamic Parallelism feature of the latest CUDA platform can bring significant benefits to BIRCH, one of the most well-known clustering techniques for streaming data. Experiment results show that, on a number of benchmark problems, the GPU accelerated BIRCH can be made up to 154 times faster than the CPU version with good scalability and high accuracy. Our work suggests that massively parallel GPU computing is a promising and effective solution to the challenges of big data. © 2013 Springer-Verlag.
What problem does this paper attempt to address?