Abstract:Recently, experiment-driven machine-learning (ML) based configuration tuning for in-memory data analytics such as Apache Spark become popular because they can achieve high speedups. However, experiment-driven ML-based approaches naturally need a large number of iterations and each iteration generates a configuration with a probabilistic strategy and executes the program on a real cluster with the configuration. It therefore takes a long time to optimize the performance of an in-memory data analytics program, and thereby hinders these approaches from being widely used in practice. To address this issue, we propose a novel as well as simple approach dubbed Terminating-It-Early (TIE) to reduce the time needed to perform the experiment executions but to achieve speedups similar to those obtained by experiment-driven ML-based approaches. The key idea is that, during the process of searching for the optimal configuration which produces the shortest execution time for a program, we terminate an experiment program execution with a trial configuration as soon as possible when we find its execution time is longer than a predefined threshold (e.g., the shortest execution time thus far). In contrast, traditional experiment-driven ML-based approaches always run all experiment executions completely. We employ 19 Apache Spark programs running on a physical cluster as well as a virtual cluster to evaluate TIE. We compare the tuning time used to find the optimal configuration of a program and the optimized execution time of a program obtained by TIE against those obtained by CherryPick and a reinforcement learning (RL) based approach. The experimental results show that on physical machines, TIE reduces the tuning time used by CherryPick and the RL-based approach by factors of 2.39× and 1.68× on average, respectively. On virtual machines, the corresponding factors are 2.79× and 1.71×. Moreover, the average optimized execution time of the 19 programs tuned by TIE is slightly shorter than those tuned by CherryPick and the RL-based approach.

TIE: Fast Experiment-driven ML-based Configuration Tuning for In-memory Data Analytics

BestConfig: Tapping the Performance Potential of Systems Via Automatic Configuration Tuning

Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling

Adaptive Code Learning for Spark Configuration Tuning

Towards General and Efficient Online Tuning for Spark

DeepCAT+: A Low-Cost and Transferrable Online Configuration Auto-Tuning Approach for Big Data Frameworks

Black or White? How to Develop an AutoTuner for Memory-based Analytics [Extended Version]

Enhancing Online Index Tuning with a Learned Tuning Diagnostic.

ML$^2$Tuner: Efficient Code Tuning via Multi-Level Machine Learning Models

LLMTune: Accelerate Database Knob Tuning with Large Language Models

A Spark Optimizer for Adaptive, Fine-Grained Parameter Tuning

ResTune: Resource Oriented Tuning Boosted by Meta-Learning for Cloud Databases

MLlib*: Fast Training of GLMs Using Spark MLlib

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Discrete Simulation Optimization for Tuning Machine Learning Method Hyperparameters

Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving

MespaConfig: Memory-Sparing Configuration Auto-Tuning for Co-Located In-Memory Cluster Computing Jobs

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning

EMIT: Micro-Invasive Database Configuration Tuning