Abstract:Efficient and automated design of optimizers plays a crucial role in full-stack AutoML systems. However, prior methods in optimizer search are often limited by their scalability, generability, or sample efficiency. With the goal of democratizing research and application of optimizer search, we present the first efficient, scalable and generalizable framework that can directly search on the tasks of interest. We first observe that optimizer updates are fundamentally mathematical expressions applied to the gradient. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the space of optimizers into a super-tree, where each path encodes an optimizer. This way, optimizer search can be naturally formulated as a path-finding problem, allowing a variety of well-established tree traversal methods to be used as the search algorithm. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent-form detection that leverage the characteristics of optimizer update rules to further boost the sample efficiency. We provide a diverse set of tasks to benchmark our algorithm and demonstrate that, with only 128 evaluations, the proposed framework can discover optimizers that surpass both human-designed counterparts and prior optimizer search methods.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problems of efficiency, scalability, and generalization ability in automatic optimizer design. Specifically, the author focuses on how to efficiently search for and design optimizers suitable for various tasks in the automated machine learning (AutoML) system. #### Background and Motivation 1. **Limitations of Existing Methods**: - **Scalability**: Traditional parameterized optimizers (such as L2O) do not have good scalability when dealing with large - scale models or datasets because they require expensive meta - learning steps (for example, backpropagation through gradient descent). - **Sample Efficiency**: Non - parameterized optimizer search frameworks (such as NOS - RL) perform relatively well but require a large number of evaluations (more than 10,000 times), which makes their computational cost too high in practical applications. - **Generalization Ability**: Existing optimizers often perform poorly on small variants of training tasks, limiting their application as general - purpose optimizers. 2. **Research Objectives**: - To provide an efficient, scalable, and generalized optimizer search framework that can be directly applied to various tasks. - To reduce the computational cost of optimizer search, making it more accessible to researchers and practitioners. #### Main Contributions 1. **Proposing a New Search Space Representation**: - Consider optimizer update rules as mathematical expressions and reorganize them into a "super - tree" by using their inherent tree structure. Each path encodes an optimizer, thus transforming the optimizer search problem into a path - finding problem. 2. **Adopting an Improved Monte Carlo Tree Search Algorithm**: - Use Monte Carlo sampling (MCT) combined with rejection sampling and equivalent form detection techniques, which significantly improve sample efficiency. These techniques help avoid repeated evaluations of inefficient optimizers and redundant expressions. 3. **Extensive Experimental Verification**: - Evaluations were carried out on multiple tasks, including handwritten digit classification, image classification, graph neural network training, adversarial attacks, and BERT fine - tuning. The results show that the proposed framework can discover optimizers superior to human - designed and other automatic search methods with only 128 evaluations. ### Formulas and Methods 1. **Mathematical Expression of Optimizer Update Rules**: \[ \theta_{t + 1}=\theta_t-\gamma\cdot\varphi(\nabla_\theta L(\theta_t)) \] where \(\theta_t\) is the current parameter, \(\gamma\) is the learning rate, and \(\varphi\) is the update function. 2. **Generation of the Super - Tree**: - Each node represents an operator, and edges represent input - output relationships. - Starting from the root node, different operators are gradually inserted to generate child nodes until the predefined maximum depth \(N\) is reached. 3. **Monte Carlo Estimation**: \[ \text{score}(v)=\mathbb{E}\left[\sum_{i = 1}^{T}R_i\mid v\right] \] where \(R_i\) is the average score of the optimizer obtained after expanding the path starting from node \(v\). 4. **Rejection Sampling and Equivalent Form Detection**: - Rejection Sampling: Exclude optimizers with poor performance. - Equivalent Form Detection: Identify and remove mathematically equivalent optimizers through hashing methods. ### Summary This paper proposes a novel and efficient optimizer search framework, which addresses the shortcomings of existing methods in terms of scalability and sample efficiency. By representing optimizer update rules as mathematical expression trees and combining with an improved Monte Carlo tree search algorithm, this framework can find high - performance optimizers within a limited number of evaluations.

Efficient Non-Parametric Optimizer Search for Diverse Tasks

A Generalizable Approach to Learning Optimizers

A Tabu Search Based Metaheuristic for Fast Global Optimizations of Inverse Problems

AutoOpt: A General Framework for Automatically Designing Metaheuristic Optimization Algorithms with Diverse Structures

Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

HUB: Guiding Learned Optimizers with Continuous Prompt Tuning

Derivative-free tree optimization for complex systems

Improving Performance Insensitivity of Large-Scale Multiobjective Optimization via Monte Carlo Tree Search

Auptimizer -- an Extensible, Open-Source Framework for Hyperparameter Tuning

Adaptive Optimizer for Automated Hyperparameter Optimization Problem

On Empirical Comparisons of Optimizers for Deep Learning

Optimization-Driven Adaptive Experimentation

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning

An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters

An Adaptive Stochastic Dominant Learning Swarm Optimizer for High-Dimensional Optimization

Monte Carlo Tree Search based Space Transfer for Black-box Optimization

Efficient Task Grouping Through Samplewise Optimisation Landscape Analysis

Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs

Subtraction-Average-Based Optimizer: A New Swarm-Inspired Metaheuristic Algorithm for Solving Optimization Problems

Practical tradeoffs between memory, compute, and performance in learned optimizers