TIE: Fast Experiment-driven ML-based Configuration Tuning for In-memory Data Analytics

Chao Chen,Jinhan Xin,Zhibin Yu
DOI: https://doi.org/10.1109/tc.2024.3365937
IF: 3.183
2024-01-01
IEEE Transactions on Computers
Abstract:Recently, experiment-driven machine-learning (ML) based configuration tuning for in-memory data analytics such as Apache Spark become popular because they can achieve high speedups. However, experiment-driven ML-based approaches naturally need a large number of iterations and each iteration generates a configuration with a probabilistic strategy and executes the program on a real cluster with the configuration. It therefore takes a long time to optimize the performance of an in-memory data analytics program, and thereby hinders these approaches from being widely used in practice. To address this issue, we propose a novel as well as simple approach dubbed Terminating-It-Early (TIE) to reduce the time needed to perform the experiment executions but to achieve speedups similar to those obtained by experiment-driven ML-based approaches. The key idea is that, during the process of searching for the optimal configuration which produces the shortest execution time for a program, we terminate an experiment program execution with a trial configuration as soon as possible when we find its execution time is longer than a predefined threshold (e.g., the shortest execution time thus far). In contrast, traditional experiment-driven ML-based approaches always run all experiment executions completely. We employ 19 Apache Spark programs running on a physical cluster as well as a virtual cluster to evaluate TIE. We compare the tuning time used to find the optimal configuration of a program and the optimized execution time of a program obtained by TIE against those obtained by CherryPick and a reinforcement learning (RL) based approach. The experimental results show that on physical machines, TIE reduces the tuning time used by CherryPick and the RL-based approach by factors of 2.39× and 1.68× on average, respectively. On virtual machines, the corresponding factors are 2.79× and 1.71×. Moreover, the average optimized execution time of the 19 programs tuned by TIE is slightly shorter than those tuned by CherryPick and the RL-based approach.
engineering, electrical & electronic,computer science, hardware & architecture
What problem does this paper attempt to address?