A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Herodotos Herodotou,Yuxing Chen,Jiaheng Lu
DOI: https://doi.org/10.1145/3381027
IF: 16.6
2021-03-31
ACM Computing Surveys
Abstract:Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.
computer science, theory & methods
What problem does this paper attempt to address?