Hyperparameter Recommendation Integrated With Convolutional Neural Network

Liping Deng,Wen-Sheng Chen,Binbin Pan,MingQing Xiao
DOI: https://doi.org/10.1109/TNNLS.2024.3476439
2024-10-18
Abstract:Hyperparameter recommendation via meta-learning has shown great promise in various studies. The main challenge for meta-learning is how to develop an effective meta-learner (learning algorithm) that can capture the intrinsic relationship between dataset characteristics and the empirical performance of hyperparameters. Existing meta-learners are mostly based on traditional machine-learning models that only learn data representations with a single layer, which are incapable of learning complex features from the data and often cannot capture those properties deeply embedded in data. To address this issue, in this article, we propose hyperparameter recommendation approaches by integrating the learning model with convolutional neural networks (CNNs). Specifically, we first formulate the recommendation task as a regression problem, where dataset characteristics are treated as predictors and the historical performance of hyperparameters as responses. We establish a CNN-based learning model with feature selection capability to serve as the regressor. We then develop a convolutional denoising autoencoder (ConvDAE) that can leverage the spatial structure of the entire hyperparameter performance space and evaluate the performance of hyperparameters via denoising when the performance of partial hyperparameters is available under the multidimensional framework. To make our approach being flexible in applications, we establish a comprehensive two-branch CNN model that can utilize both dataset characteristics and partial evaluations to make effective recommendations. We conduct extensive experiments on 400 real classification problems and the well-known SVM. Our proposed approaches outperform existing meta-learning baselines as well as various search algorithms, demonstrating the high effectiveness in hyperparameter recommendations via deep learning.
What problem does this paper attempt to address?