Abstract:Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **the application and challenges of bi - level optimization in deep learning**. Specifically, the paper focuses on how gradient - based bi - level optimization methods are widely used in deep - learning tasks, such as hyper - parameter optimization and meta - knowledge extraction. Bi - level optimization is a special optimization problem in which one problem is nested within another. Gradient - based bi - level optimization solves the outer - layer task by calculating the hypergradient, which is more efficient than traditional methods such as evolutionary algorithms. ### Main contributions of the paper: 1. **Definition and classification**: - Gives a formal definition of gradient - based bi - level optimization. - Proposes a classification criterion for task formalization, which helps researchers determine whether a research problem is suitable for using bi - level optimization and provides practical guidelines on how to structure the problem into a bi - level optimization framework. 2. **Single - task and multi - task formalization**: - **Single - task formalization**: Mainly used for optimizing hyper - parameters, such as regularization parameters and distilled data. - **Multi - task formalization**: Used for extracting meta - knowledge, such as model initialization. 3. **Solution methods**: - Discusses four gradient - based bi - level optimization solvers, including explicit gradient update, surrogate update, implicit function update, and closed - form update. 4. **Future directions**: - **Effective data optimization**: Data optimization in scientific fields from the perspective of task formalization. - **Accurate explicit surrogate update**: Analyzes the accuracy of explicit surrogate update from an optimization perspective. ### Specific content of the paper: - **Introduction**: - Introduces the concept of bi - level optimization and its importance in deep learning. - Outlines the applications of bi - level optimization in hyper - parameter optimization and meta - knowledge extraction. - **Definition**: - Defines the mathematical expression of the gradient - based bi - level optimization problem. - Lists in detail the commonly used symbols and their meanings. - **Task formalization**: - **Single - task formalization**: Applicable to hyper - parameter optimization in a single task. - **Multi - task formalization**: Applicable to meta - knowledge extraction in multiple tasks. - **Solution methods**: - Introduces in detail four gradient - based bi - level optimization solvers. - **Future directions**: - Proposes two future research directions, emphasizing the importance of data optimization and surrogate update. ### Formula examples: - **Mathematical expression of bi - level optimization problem**: \[ \phi^*=\arg\min_{\phi}L_{\text{out}}(\theta^*(\phi),\phi) \] where, \[ \theta^*(\phi)=\arg\min_{\theta}L_{\text{in}}(\theta,\phi) \] - **Calculation of hypergradient**: \[ \frac{dL_{\text{out}}}{d\phi}=\frac{\partial L_{\text{out}}}{\partial\theta}\frac{\partial\theta(\phi)}{\partial\phi}+\frac{\partial L_{\text{out}}}{\partial\phi} \] ### Conclusion: This paper provides researchers with a comprehensive guide by systematically reviewing and analyzing the application of gradient - based bi - level optimization methods in deep learning, helping them better apply this technology in practical research. At the same time, the paper also points out potential directions for future research, providing guidance for follow - up research.

Gradient-based Bi-level Optimization for Deep Learning: A Survey

Investigating Bi-Level Optimization for Learning and Vision From a Unified Perspective: A Survey and Beyond

A Globally Convergent Gradient-based Bilevel Hyperparameter Optimization Method

A General Descent Aggregation Framework for Gradient-based Bi-level Optimization

A Gradient-based Bilevel Optimization Approach for Tuning Hyperparameters in Machine Learning

On Penalty-based Bilevel Gradient Descent Method

Bilevel Optimization for Machine Learning: Algorithm Design and Convergence Analysis

Efficient Method for Bi-level Optimization with Non-smooth Lower-Level Problem

A Survey on Multi-Objective based Parameter Optimization for Deep Learning

Transfer Learning for Bayesian Optimization: A Survey

Improved Penalty Method Via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization.

A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning

An Introduction to Bi-level Optimization: Foundations and Applications in Signal Processing and Machine Learning

Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues

Towards Differentiable Multilevel Optimization: A Gradient-Based Approach

Scalable Nested Optimization for Deep Learning

Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms

Open Problems in Applied Deep Learning

Gradient-based algorithms for multi-objective bi-level optimization