Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling

Sewal, Piyush,Singh, Hari
DOI: https://doi.org/10.1007/s10586-024-04478-4
2024-05-08
Cluster Computing
Abstract:The performance optimization of Apache Spark, a widely used distributed computing framework, is crucial for the efficient execution of data-intensive workloads. However, the automatic tuning of around 180 internal configuration features of Spark for getting optimum performance can be a complex, expensive, and time-consuming task. This research paper aims to address these challenges along with the task of reducing execution time and tuning cost for benchmark Spark MLlib workloads on a benchmark dataset by automating the tuning process. A grid search based novel approach Reduced Internal Configuration Grid (RICG) is proposed that reduces the Spark's configuration feature values grid levaraging exponential projective sampling. A significant reduction in Spark workload execution time is obtained at a low cost without compromising the performance of the CPU, memory and network resources of the system. Extensive experimentation and evaluation demonstrate the effectiveness of the proposed approach, showcasing notable reductions of around 26–30% in execution time for benchmark workloads. This reduction is achieved by cutting the features tuning cost by approximately 58 to 60% compared to the prior best grid search based approach. The proposed RICG model outperforms existing statistical trial-test approaches, striking a favorable balance between performance optimization and cost optimization. Lastly, the logs generated by the RICG model are used to predict the performance of Spark application workloads by using the boosting algorithms with around 90% accuracy. The findings of this research highlight the potential of automated tuning techniques to enhance the performance of Apache Spark-based applications with cost efficiency.
computer science, information systems, theory & methods
What problem does this paper attempt to address?