Comparative analysis of Spark and Hadoop through Imputation of Data on Big Datasets

Pooja Choudhary,Kanwal Garg
DOI: https://doi.org/10.1109/ibssc53889.2021.9673461
2021-11-18
Abstract:The various framework has been developed to deal with the big data in the remote sensing field. A comparison between the two big data processing frameworks known as Hadoop and Spark is presented in this article. The unstructured and structured data is processed efficiently with the spark framework. The Hadoop framework is used two elements Hadoop Distributed File System (HDFS), for storage and MapReduce to process the data. The Hadoop MapReduce system shows computational complexity in terms of big data processing. The execution time of the spark framework is faster than the Hadoop MapReduce platform of big data processing. We tested the PARAFAC tensor factorization-based data imputation task on the Hadoop and Spark platform. The data imputation task outcomes in Spark are much faster than the Hadoop and without an API framework. The execution time during Spark framework is lesser than the Hadoop and without API model of tensor factorization
What problem does this paper attempt to address?