Comparison and Improvement of Hadoop MapReduce Performance Prediction Models in the Private Cloud.

Nini Wang,Jian Yang,ZhiHui Lu,Xiaoyan Li,Jie Wu
DOI: https://doi.org/10.1007/978-3-319-49178-3_6
2016-01-01
Abstract:Performance modeling for MapReduce applications with large-scale data is a very important issue in the study of optimization, evaluation, prediction and resource scheduling of the jobs over big data and cloud computing platforms. In this paper, we study the Hadoop distributed computing framework, which is the current trend of Big Data solutions. We use the locally weighted linear regression (LWLR) algorithm and linear regression (LR) algorithm to establish three kinds of computing models based on different characteristics to estimate the execution time of the applications that have large-scale data and run on the Hadoop framework, and at the same time we make comparison and improvement to the three models. By building different types of experimental environments, and running different types of jobs, we can draw a conclusion that all the three models have very good results in predicting the execution time and evaluating the performance of large-scale data applications with small-scale data.
What problem does this paper attempt to address?