Abstract:As deep learning techniques advance more than ever, hyper-parameter optimization is the new major workload in deep learning clusters. Although hyper-parameter optimization is crucial in training deep learning models for high model performance, effectively executing such a computation-heavy workload still remains a challenge. We observe that numerous trials issued from existing hyper-parameter optimization algorithms share common hyper-parameter sequence prefixes, which implies that there are redundant computations from training the same hyper-parameter sequence multiple times. We propose a stage-based execution strategy for efficient execution of hyper-parameter optimization algorithms. Our strategy removes redundancy in the training process by splitting the hyper-parameter sequences of trials into homogeneous stages, and generating a tree of stages by merging the common prefixes. Our preliminary experiment results show that applying stage-based execution to hyper-parameter optimization algorithms outperforms the original trial-based method, saving required GPU-hours and end-to-end training time by up to 6.60 times and 4.13 times, respectively.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the computational redundancy and resource waste in **Hyper - parameter Optimization (HPO)** during deep - learning model training. Specifically, traditional hyper - parameter optimization methods usually perform a large number of repetitive computations because different trials may share the same hyper - parameter sequence prefix, but these prefixes will be trained multiple times in different trials, resulting in a waste of computational resources. ### Main contributions of the paper 1. **Stage - based Execution Strategy**: - The author proposes a new execution strategy, decomposing each trial into multiple homogeneous stages, and reducing redundant computations by merging stages with the same prefix. - This method realizes more efficient resource utilization by constructing a stage - tree to represent the relationships between different trials. 2. **Improving computational and resource efficiency**: - The experimental results show that, compared with the traditional trial - based execution strategy, the stage - based execution strategy can reduce the GPU usage time and the end - to - end training time by up to 6.6 times and 4.13 times respectively. 3. **Supporting multi - study optimization**: - This method can also be extended to multiple research tasks, further improving the efficiency of hyper - parameter optimization by sharing previous research history. 4. **Handling continuous search spaces**: - For hyper - parameter sequences with discrete values, the stage - based execution strategy shows significant advantages; while for hyper - parameter sequences with continuous values, although there is less overlap, a certain efficiency improvement can still be obtained through appropriate adjustments. ### Formula representation Some key concepts involved in the paper can be represented by formulas as follows: - **Hyper - parameter configuration**: Each trial can be represented as a hyper - parameter sequence \(\mathbf{h} = [h_1, h_2,..., h_T]\), where \(T\) is the length of the sequence. - **Stage - tree**: Each node in the stage - tree represents a stage and can be represented by a triple \((\mathbf{h}_i, t_i, r_i)\), where \(\mathbf{h}_i\) is the hyper - parameter configuration of this stage, \(t_i\) is the number of iterations, and \(r_i\) is the resource requirement. ### Summary By introducing the stage - based execution strategy, this paper effectively reduces the redundant computations in the hyper - parameter optimization process and improves the computational and resource efficiency of deep - learning model training. This is of great significance for the optimization of large - scale deep - learning tasks.

Stage-based Hyper-parameter Optimization for Deep Learning

Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees

Two-step hyperparameter optimization method: Accelerating hyperparameter search by using a fraction of a training dataset

Pre-training the Deep Generative Models with Adaptive Hyperparameter Optimization

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Scheduling Optimization Techniques for Neural Network Training

Efficient Hyper-parameter Optimization for NLP Applications.

Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale

Hyperparameter Optimization with Neural Network Pruning

Optimizing execution for pipelined‐based distributed deep learning in a heterogeneously networked GPU cluster

Efficient Hyperparameter Optimization with Probability-based Resource Allocating on Deep Neural Networks

Grouper: Accelerating Hyperparameter Searching in Deep Learning Clusters with Network Scheduling

BTTackler: A Diagnosis-based Framework for Efficient Deep Learning Hyperparameter Optimization

An Experimental Study on Hyper-parameter Optimization for Stacked Auto-Encoders

Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning

Gradient Descent Optimization in Deep Learning Model Training Based on Multistage and Method Combination Strategy

Value Function Based Performance Optimization of Deep Learning Workloads

Efficient Hyperparameter Optimization in Deep Learning Using a Variable Length Genetic Algorithm

Hardware-aware Approach to Deep Neural Network Optimization

An Efficient Optimization Technique for Training Deep Neural Networks

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training