A Collaborative Filtering Based Approach To Performance Prediction For Parallel Applications

Qingshi Shao,Li Pan,Shijun Liu,Xinyan Liu
DOI: https://doi.org/10.1109/CSCWD.2017.8066716
2017-01-01
Abstract:Parallel application jobs account for a large population in current domain of cloud computing and Big Data processing services, whose execution time can be varied greatly with different runtime configurations. For efficiently scheduling resources and services to run parallel jobs, the ability to quickly and accurately estimate the performance of parallel applications is critical. Analytic predictive models based on traditional modeling techniques such as queuing systems are difficult to construct for parallel applications, due to the high complexity lying in the structures of parallel application models. Furthermore, due to the heterogeneity of resources computing capacities with a scalable computing environment such as a cloud computing platform, performance analytic and prediction becomes increasingly difficult for parallel applications. To address this problem, in this paper we propose a collaborative filtering based approach to quickly and accurately predict the execution time of parallel applications running in heterogenous resources. Particularly, we use the widely used Apache Spark platform as the running framework for parallel applications, and propose a bounds-based performance model to improve the prediction accuracy. Through extensive simulations and experiments on real Spark clusters and two large-scale machine learning applications as well as the simple but classic WordCount sample application, we show that the proposed Collaborative Filtering based approach and bounds-based performance model can accurately estimate the performance of parallel applications.
What problem does this paper attempt to address?