Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters

Langshi Chen,Bo Peng,Bingjing Zhang,Tony Liu,Yiming Zou,Lei Jiang,Robert Henschel,Craig Stewart,Zhang,Emily Mccallum,Zahniser Tom,Omer Jon,Judy Qiu
DOI: https://doi.org/10.1109/cloud.2017.19
2017-01-01
Abstract:Data analytics is undergoing a revolution in many scientific domains, and demands cost-effective parallel data analysis techniques. Traditional Java-based Big Data processing tools like Hadoop MapReduce are designed for commodity CPUs. In contrast, emerging manycore processors like the Xeon Phi have an order of magnitude greater computation power and memory bandwidth. To harness their computing capabilities, we propose the Harp-DAAL framework. We show that enhanced versions of MapReduce can be replaced by Harp, a Hadoop plug-in, that offers useful data abstractions for both high-performance iterative computation and MPI-quality communication, as well as drive Intel's native DAAL library. We select a subset of three machine learning algorithms and implement them within Harp-DAAL. Our scalability benchmarks ran on Knights Landing (KNL) clusters and achieved up to 2.5 times speedup of performance over the HPC solution in NOMAD and 15 to 40 times speedup over Java-based solutions in Spark. We further quantify the workloads on single node KNL with a performance breakdown at the micro-architecture level.
What problem does this paper attempt to address?