Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization

Thomas Nagler,Lennart Schneider,Bernd Bischl,Matthias Feurer

2024-05-24

Abstract:Hyperparameter optimization is crucial for obtaining peak performance of machine learning models. The standard protocol evaluates various hyperparameter configurations using a resampling estimate of the generalization error to guide optimization and select a final hyperparameter configuration. Without much evidence, paired resampling splits, i.e., either a fixed train-validation split or a fixed cross-validation scheme, are often recommended. We show that, surprisingly, reshuffling the splits for every configuration often improves the final model's generalization performance on unseen data. Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment. While reshuffling leads to test performances that are competitive with using fixed splits, it drastically improves results for a single train-validation holdout protocol and can often make holdout become competitive with standard CV while being computationally cheaper.

Machine Learning

What problem does this paper attempt to address?

This paper mainly discusses how reshuffling resampling splits can potentially improve the generalization performance of models in hyperparameter optimization (HPO). Typically, HPO evaluates different configurations by using fixed training-validation splits or cross-validation to minimize the estimated generalization error. However, the paper points out that reshuffling the splits for each configuration can improve the final model's generalization performance on unseen data. Theoretically, the paper analyzes how reshuffling affects the asymptotic behavior of the validation loss surface and provides an upper bound on the expected regret in extreme cases, linking potential benefits with the signal and noise characteristics of the optimization problem. Through controlled simulation studies, they confirm these theoretical insights and demonstrate the practical utility of reshuffling in large-scale, real-world HPO experiments. The experiments show that reshuffling can improve test performance, particularly under a single training-validation split protocol, often making the validation competitive with standard cross-validation while reducing computational costs. Additionally, the paper discusses the relationship between reshuffling and overfitting, as well as its impact on algorithms such as random search and Bayesian optimization (BO). While reshuffling typically has a small impact on 5-fold cross-validation, its improvement is particularly significant for holdout methods, achieving comparable generalization performance to 5-fold cross-validation without increasing computational costs. In conclusion, this paper proposes a simple yet less-known technique of reshuffling the splits in the HPO process, which can effectively improve the generalization ability of machine learning models, especially when the loss surface is flat and the estimation noise is large. This finding contributes to optimizing hyperparameter selection strategies to improve model performance on new data.

Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization

An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning

On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning

High Probability Guarantees for Random Reshuffling

Hyperparameters in Reinforcement Learning and How To Tune Them

Using sequential statistical tests for efficient hyperparameter tuning

Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition

Optimizing for Generalization in Machine Learning with Cross-Validation Gradients

Discrete Simulation Optimization for Tuning Machine Learning Method Hyperparameters

A Modified Bayesian Optimization based Hyper-Parameter Tuning Approach for Extreme Gradient Boosting

Exploring the Optimized Value of Each Hyperparameter in Various Gradient Descent Algorithms

Quantity vs. Quality: On Hyperparameter Optimization for Deep Reinforcement Learning

Decision Confidence and Outcome Variability Optimally Regulate Separate Aspects of Hyperparameter Setting

Learning to Optimize Computational Resources: Frugal Training with Generalization Guarantees

Two-step hyperparameter optimization method: Accelerating hyperparameter search by using a fraction of a training dataset

Enhancing the Performance of Bandit-based Hyperparameter Optimization

Stage-based Hyper-parameter Optimization for Deep Learning

An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

Data splitting improves statistical performance in overparametrized regimes