Introduction to Harp: when Big Data Meets HPC

Bo Peng,Langshi Chen,Yiming Zhou,Judy Qiu
2017-01-01
Abstract:Data analytics is undergoing a revolution in many scientific domains, demanding cost-effective parallel data analysis techniques. We consider the challenges of creating a high performance data analysis software framework in the context of the current HPC-ABDS software stack (High Performance Computing enhanced Apache Big Data Stack) [1]. We have summarized a list of current data processing software from either HPC or commercial sources [2]. Many critical components of the commodity stack (such as Hadoop) come from Apache open source projects for community usage, while HPC (such as collective communication) is needed to bring performance and other parallel computing capabilities.
What problem does this paper attempt to address?