Optimizing the MapReduce Framework for CPU-MIC Heterogeneous Cluster

Wenzhu Wang,Qingbo Wu,Yusong Tan,Yaoxue Zhang
DOI: https://doi.org/10.1007/978-3-319-23216-4_3
2015-01-01
Abstract:MapReduce is a distributed programming paradigm to process large scale data set. Meanwhile, with the development of coprocessors, heterogeneous architecture is widely used for getting high performance. Therefore, it is natural to try to leverage both of them for big data processing. In this paper, we propose an optimized MapReduce framework for CPU-MIC heterogeneous Cluster, which mainly provides the following new features: First, a runtime is developed for MIC management, fault tolerance, and task scheduling. Second, we design SIMD friendly map and pipelined reduce to improve the efficiency of resources utilization. In addition, a memory management scheme is implemented for accessing <key, value> pairs on MIC efficiently. The experimental results show that our system is up to 2.4x and 8.1x faster than Hadoop for different applications.
What problem does this paper attempt to address?