Be aware of overfitting by hyperparameter optimization!

Igor V. Tetko,Ruud van Deursen,Guillaume Godin

2024-07-30

Abstract:Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each dataset using different data cleaning protocols and hyperparameter optimization. In our study we showed that hyperparameter optimization did not always result in better models, possibly due to overfitting when using the same statistical measures. Similar results could be calculated using pre-set hyperparameters, reducing the computational effort by around 10,000 times. We also extended the previous analysis by adding a representation learning method based on Natural Language Processing of smiles called Transformer CNN. We show that across all analyzed sets using exactly the same protocol, Transformer CNN provided better results than graph-based methods for 26 out of 28 pairwise comparisons by using only a tiny fraction of time as compared to other methods. Last but not least we stressed the importance of comparing calculation results using exactly the same statistical measures.

Machine Learning,Artificial Intelligence

What problem does this paper attempt to address?

The paper attempts to address the following issues: 1. **The necessity of hyperparameter optimization**: The paper explores whether hyperparameter optimization in machine learning truly leads to significant improvements in model performance. The authors found through experiments that in some cases, using preset hyperparameters (i.e., hyperparameters that have not been optimized) can achieve similar or even better results. This may be due to over-optimization leading to model overfitting. 2. **The demand for computational resources**: Hyperparameter optimization typically requires a large amount of computational resources, especially when dealing with large-scale datasets. The authors demonstrate how to achieve good model performance without hyperparameter optimization using relatively limited computational resources (such as ordinary clusters in an academic environment), thereby significantly reducing computational costs. 3. **Comparison of different methods**: The paper compares the performance of graph-based methods (such as Attentive FingerPrint and ChemProp) with natural language processing-based methods (such as Transformer CNN) in predicting water solubility. The results show that Transformer CNN provides higher accuracy in most cases and requires much less computation time than other methods. 4. **The impact of data cleaning and organization**: The paper analyzes the impact of different data cleaning and organization methods on model performance. The authors found that even after data cleaning and organization, models with preset hyperparameters can still perform comparably to those with optimized hyperparameters. 5. **Consistency of statistical metrics**: The paper emphasizes the importance of using the same statistical metrics when comparing the performance of different models. The authors point out the differences between traditional RMSE and custom cuRMSE, and discuss the impact of these differences on model performance evaluation. In summary, the main purpose of this paper is to explore the practical value of hyperparameter optimization in machine learning and how to efficiently train models under limited resources, while also providing a new, efficient method for predicting water solubility.

Be aware of overfitting by hyperparameter optimization!

Prediction of intrinsic solubility for drug-like organic compounds using Automated Network Optimizer (ANO) for physicochemical feature and hyperparameter optimization

Is One Hyperparameter Optimizer Enough?

Efficient Hyper-parameter Optimization for NLP Applications.

Discrete Simulation Optimization for Tuning Machine Learning Method Hyperparameters

Scaling Exponents Across Parameterizations and Optimizers

Will we ever be able to accurately predict solubility?

HOAX: A Hyperparameter Optimization Algorithm Explorer for Neural Networks

Training Deep Neural Networks by optimizing over nonlocal paths in hyperparameter space

HomOpt: A Homotopy-Based Hyperparameter Optimization Method

Simmering: Sufficient is better than optimal for training neural networks

Hyperopt: a Python library for model selection and hyperparameter optimization

Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training

Two-step hyperparameter optimization method: Accelerating hyperparameter search by using a fraction of a training dataset

On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice

Efficient Combustion Kinetic Parameter Optimization Via Variational Inference

Scaling Laws for Hyperparameter Optimization

Hyperparameter Optimization of the Machine Learning Model for Distillation Processes

An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks

Learn to Optimize - A Brief Overview

Methodology for Hyperparameter Tuning of Deep Neural Networks for Efficient and Accurate Molecular Property Prediction