Abstract:Hyperparameter tuning remains a significant challenge for the training of deep neural networks (DNNs), requiring manual and/or time-intensive grid searches, increasing resource costs and presenting a barrier to the democratization of machine learning. The global initial learning rate for DNN training is particularly important. Several techniques have been proposed for automated learning rate tuning during training; however, they still require manual searching for the global initial learning rate. Though methods exist that do not require this initial selection, they suffer from poor performance. Here, we present ExpTest, a sophisticated method for initial learning rate searching and subsequent learning rate tuning for the training of DNNs. ExpTest draws on insights from linearized neural networks and the form of the loss curve, which we treat as a real-time signal upon which we perform hypothesis testing. We mathematically justify ExpTest and provide empirical support. ExpTest requires minimal overhead, is robust to hyperparameter choice, and achieves state-of-the-art performance on a variety of tasks and architectures, without initial learning rate selection or learning rate scheduling.

What problem does this paper attempt to address?

This paper aims to solve the problem of hyper - parameter tuning in the training of deep neural networks (DNNs), especially the selection of the global initial learning rate. Although there are already some techniques for automated learning - rate adjustment, these methods still require a manual selection of an initial global learning rate. And those methods that do not require an initial selection often have poor performance. Therefore, the paper proposes a new method - ExpTest, which is used to automatically search and adjust the initial learning rate, so as to achieve efficient DNN training without the need for manual selection of the initial learning rate. ### Main contributions of the paper 1. **Automated learning - rate search and adjustment**: - **ExpTest**: Through insights obtained from linearized neural networks, a lightweight algorithm is designed that can automatically search and adjust the learning rate during the training process. - **No need for initial learning - rate selection**: ExpTest does not require a pre - selection of an initial global learning rate, thus reducing the need for manual parameter tuning. 2. **Theoretical basis**: - **Exponential decay behavior**: The paper proves through mathematical derivation that under the convergence condition, the loss function can be approximated as an exponential decay form. - **Upper - bound estimation**: Based on the linear model, an upper - bound estimation method for the learning rate is proposed to ensure the convergence of the algorithm. 3. **Algorithm implementation**: - **Hypothesis testing**: Through hypothesis testing (such as F - test and t - test) to detect the exponential decay behavior of the loss curve, so as to decide whether the learning rate needs to be adjusted. - **Window - size selection**: By considering the maximum curvature point of the loss curve, the appropriate window size is dynamically selected to improve the robustness of the algorithm. 4. **Experimental verification**: - **Multi - task verification**: Experiments are carried out on multiple tasks (including regression and classification) and different network architectures to verify the effectiveness of ExpTest. - **Robustness analysis**: The robustness of ExpTest to hyper - parameter selection (such as α and β) and mini - batch size is demonstrated, ensuring the stable performance of the algorithm under different conditions. ### Key formulas - **Gradient - descent update rule**: \[ \theta_{t + 1}=\theta_t-\eta\nabla L(\theta_t) \] - **Loss function**: \[ L=\frac{1}{2m}\sum_{i = 1}^m(\hat{y}_i - y_i)^2 \] - **Upper bound of learning rate**: \[ \eta_{\text{max}}=\frac{2ms}{\lambda_{\text{max}}(s - 1)} \] - **Exponentially decaying loss function**: \[ L(t)=C+\sum A e^{-Bt} \] ### Experimental results - **Handwritten digit classification task**: - On the MNIST dataset, ExpTest shows the fastest initial convergence speed and, in most cases, reaches the lowest training loss and the highest test accuracy. - For different learning rates, mini - batch sizes and hyper - parameter selections, ExpTest shows good robustness and stability. Through these contributions, the paper provides an efficient and automated learning - rate tuning method, which is expected to further promote research and application in the field of deep learning.

ExpTest: Automating Learning Rate Searching and Tuning with Insights from Linearized Neural Networks

Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks

Discrete Simulation Optimization for Tuning Machine Learning Method Hyperparameters

An effective algorithm for hyperparameter optimization of neural networks

Hyper-Parameter Auto-Tuning for Sparse Bayesian Learning

Training Artificial Neural Networks Using a Global Optimization Method That Utilizes Neural Networks

A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning

Derivative-Free Optimization with Adaptive Experience for Efficient Hyper-Parameter Tuning.

Search Algorithms for Automated Hyper-Parameter Tuning

Using sequential statistical tests for efficient hyperparameter tuning

An optimization Strategy for Deep Neural Networks Training

Selecting and Composing Learning Rate Policies for Deep Neural Networks

Two-step hyperparameter optimization method: Accelerating hyperparameter search by using a fraction of a training dataset

Deep Neural Network Hyperparameter Optimization with Orthogonal Array Tuning

Hyperparameter Tuning of Deep learning Models in Keras

Hyper-Tune: Towards Efficient Hyper-parameter Tuning at Scale

Learning Rate Optimization for Deep Neural Networks Using Lipschitz Bandits

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

LLR: Learning Learning Rates by LSTM for Training Neural Networks.

HELP: an LSTM-based Approach to Hyperparameter Exploration in Neural Network Learning.

Where Do Large Learning Rates Lead Us?