Computational Performance Analysis of Cluster-based Technologies for Big Data Analytics

Mukhtaj Khan,Salman,Nadeem Iqbal
DOI: https://doi.org/10.1109/ithings-greencom-cpscom-smartdata.2017.239
2017-06-01
Abstract:Due to rapid development in Internet, applications and communication technology a huge volume of unstructured data is generated from various sources such as social media, sensor networks, online services, healthcare devices, bioinformatics, computational biology and many more sources. However, the huge volume of data is facing numerous challenges in term of storage and timely processing. Distributed computing platform such as Hadoop MapReduce and Spark is becoming major programming models for data intensive applications. In this paper we compare the performance of both Hadoop MapReduce and Spark programming models in term of computation efficiency. For the purpose of comparison of both the programming models, we employ three applications such as WordCount, Sort and PageRank with varied size of input datasets. The experimental results show that Spark outperforms Hadoop MapReduce in all cases.
What problem does this paper attempt to address?