GPU Computations on Hadoop Clusters for Massive Data Processing

Wenbo Chen,Shungou Xu,Hai Jiang,Tien-Hsiung Weng,Mario Donato Marino,Yi-Siang Chen,Kuan-Ching Li
DOI: https://doi.org/10.1007/978-3-319-17314-6_66
2016-01-01
Abstract:Hadoop is a well-designed approach for handling massive amount of data. Comprised at the core of the Hadoop File System and MapReduce, it schedules the processing by orchestrating the distributed servers, providing redundancy and fault tolerance. In terms of performance, Hadoop is still behind high performance capacity due to CPUs' limited parallelism, though. GPU accelerated computing involves the use of a GPU together with a CPU to accelerate applications to data processing on GPU cluster toward higher efficiency. However, GPU cluster has low level data storage capacity. In this chapter, we exploit the hybrid model of GPU and Hadoop to make best use of both capabilities, and the design and implementation of application using Hadoop and CUDA is presented through two interfaces: Hadoop Streaming and Hadoop Pipes. Experimental results on K-means algorithm are presented as well as their performance results are discussed.
What problem does this paper attempt to address?