Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters

Koichi Shirahata,Hitoshi Sato,Satoshi Matsuoka
DOI: https://doi.org/10.1109/cloudcom.2010.55
2010-11-01
Abstract:MapReduce is a programming model that enables efficient massive data processing in large-scale computing environments such as supercomputers and clouds. Such large-scale computers employ GPUs to enjoy its good peak performance and high memory bandwidth. Since the performace of each job is depending on running application characteristics and underlying computing environments, scheduling MapReduce tasks onto CPU cores and GPU devices for efficient execution is difficult. To address this problem, we have proposed a hybrid scheduling technique for GPU-based computer clusters, which minimizes the execution time of a submitted job using dynamic profiles of Map tasks running on CPU cores and GPU devices. We have implemented a prototype of our proposed scheduling technique by extending MapReduce framework, Hadoop. We have conducted some experiments for this prototype by using a K-means application as a benchmark on a supercomputer. The results show that the proposed technique achieves 1.93 times faster than the Hadoop original scheduling algorithm at 64 nodes (1024 CPU cores and 128 GPU devices). The results also indicate that the performance of map tasks, including both CPU and GPU tasks, is significantly affected by the overhead of map task invocation in the Hadoop framework.
What problem does this paper attempt to address?