Efficient Non-Parametric Optimizer Search for Diverse Tasks

Ruochen Wang,Yuanhao Xiong,Minhao Cheng,Cho-Jui Hsieh
DOI: https://doi.org/10.48550/arXiv.2209.13575
2022-09-28
Abstract:Efficient and automated design of optimizers plays a crucial role in full-stack AutoML systems. However, prior methods in optimizer search are often limited by their scalability, generability, or sample efficiency. With the goal of democratizing research and application of optimizer search, we present the first efficient, scalable and generalizable framework that can directly search on the tasks of interest. We first observe that optimizer updates are fundamentally mathematical expressions applied to the gradient. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the space of optimizers into a super-tree, where each path encodes an optimizer. This way, optimizer search can be naturally formulated as a path-finding problem, allowing a variety of well-established tree traversal methods to be used as the search algorithm. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent-form detection that leverage the characteristics of optimizer update rules to further boost the sample efficiency. We provide a diverse set of tasks to benchmark our algorithm and demonstrate that, with only 128 evaluations, the proposed framework can discover optimizers that surpass both human-designed counterparts and prior optimizer search methods.
Machine Learning,Artificial Intelligence,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problems of efficiency, scalability, and generalization ability in automatic optimizer design. Specifically, the author focuses on how to efficiently search for and design optimizers suitable for various tasks in the automated machine learning (AutoML) system. #### Background and Motivation 1. **Limitations of Existing Methods**: - **Scalability**: Traditional parameterized optimizers (such as L2O) do not have good scalability when dealing with large - scale models or datasets because they require expensive meta - learning steps (for example, backpropagation through gradient descent). - **Sample Efficiency**: Non - parameterized optimizer search frameworks (such as NOS - RL) perform relatively well but require a large number of evaluations (more than 10,000 times), which makes their computational cost too high in practical applications. - **Generalization Ability**: Existing optimizers often perform poorly on small variants of training tasks, limiting their application as general - purpose optimizers. 2. **Research Objectives**: - To provide an efficient, scalable, and generalized optimizer search framework that can be directly applied to various tasks. - To reduce the computational cost of optimizer search, making it more accessible to researchers and practitioners. #### Main Contributions 1. **Proposing a New Search Space Representation**: - Consider optimizer update rules as mathematical expressions and reorganize them into a "super - tree" by using their inherent tree structure. Each path encodes an optimizer, thus transforming the optimizer search problem into a path - finding problem. 2. **Adopting an Improved Monte Carlo Tree Search Algorithm**: - Use Monte Carlo sampling (MCT) combined with rejection sampling and equivalent form detection techniques, which significantly improve sample efficiency. These techniques help avoid repeated evaluations of inefficient optimizers and redundant expressions. 3. **Extensive Experimental Verification**: - Evaluations were carried out on multiple tasks, including handwritten digit classification, image classification, graph neural network training, adversarial attacks, and BERT fine - tuning. The results show that the proposed framework can discover optimizers superior to human - designed and other automatic search methods with only 128 evaluations. ### Formulas and Methods 1. **Mathematical Expression of Optimizer Update Rules**: \[ \theta_{t + 1}=\theta_t-\gamma\cdot\varphi(\nabla_\theta L(\theta_t)) \] where \(\theta_t\) is the current parameter, \(\gamma\) is the learning rate, and \(\varphi\) is the update function. 2. **Generation of the Super - Tree**: - Each node represents an operator, and edges represent input - output relationships. - Starting from the root node, different operators are gradually inserted to generate child nodes until the predefined maximum depth \(N\) is reached. 3. **Monte Carlo Estimation**: \[ \text{score}(v)=\mathbb{E}\left[\sum_{i = 1}^{T}R_i\mid v\right] \] where \(R_i\) is the average score of the optimizer obtained after expanding the path starting from node \(v\). 4. **Rejection Sampling and Equivalent Form Detection**: - Rejection Sampling: Exclude optimizers with poor performance. - Equivalent Form Detection: Identify and remove mathematically equivalent optimizers through hashing methods. ### Summary This paper proposes a novel and efficient optimizer search framework, which addresses the shortcomings of existing methods in terms of scalability and sample efficiency. By representing optimizer update rules as mathematical expression trees and combining with an improved Monte Carlo tree search algorithm, this framework can find high - performance optimizers within a limited number of evaluations.