Gradient-based Bi-level Optimization for Deep Learning: A Survey

Can Chen,Xi Chen,Chen Ma,Zixuan Liu,Xue Liu
2023-07-10
Abstract:Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta-knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer-level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Next, we delineate criteria to determine if a research problem is apt for bi-level optimization and provide a practical guide on structuring such problems into a bi-level optimization framework, a feature particularly beneficial for those new to this domain. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta-knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Finally, we wrap up the survey by highlighting two prospective future directions: (1) Effective Data Optimization for Science examined through the lens of task formulation. (2) Accurate Explicit Proxy Update analyzed from an optimization standpoint.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the application and challenges of bi - level optimization in deep learning**. Specifically, the paper focuses on how gradient - based bi - level optimization methods are widely used in deep - learning tasks, such as hyper - parameter optimization and meta - knowledge extraction. Bi - level optimization is a special optimization problem in which one problem is nested within another. Gradient - based bi - level optimization solves the outer - layer task by calculating the hypergradient, which is more efficient than traditional methods such as evolutionary algorithms. ### Main contributions of the paper: 1. **Definition and classification**: - Gives a formal definition of gradient - based bi - level optimization. - Proposes a classification criterion for task formalization, which helps researchers determine whether a research problem is suitable for using bi - level optimization and provides practical guidelines on how to structure the problem into a bi - level optimization framework. 2. **Single - task and multi - task formalization**: - **Single - task formalization**: Mainly used for optimizing hyper - parameters, such as regularization parameters and distilled data. - **Multi - task formalization**: Used for extracting meta - knowledge, such as model initialization. 3. **Solution methods**: - Discusses four gradient - based bi - level optimization solvers, including explicit gradient update, surrogate update, implicit function update, and closed - form update. 4. **Future directions**: - **Effective data optimization**: Data optimization in scientific fields from the perspective of task formalization. - **Accurate explicit surrogate update**: Analyzes the accuracy of explicit surrogate update from an optimization perspective. ### Specific content of the paper: - **Introduction**: - Introduces the concept of bi - level optimization and its importance in deep learning. - Outlines the applications of bi - level optimization in hyper - parameter optimization and meta - knowledge extraction. - **Definition**: - Defines the mathematical expression of the gradient - based bi - level optimization problem. - Lists in detail the commonly used symbols and their meanings. - **Task formalization**: - **Single - task formalization**: Applicable to hyper - parameter optimization in a single task. - **Multi - task formalization**: Applicable to meta - knowledge extraction in multiple tasks. - **Solution methods**: - Introduces in detail four gradient - based bi - level optimization solvers. - **Future directions**: - Proposes two future research directions, emphasizing the importance of data optimization and surrogate update. ### Formula examples: - **Mathematical expression of bi - level optimization problem**: \[ \phi^*=\arg\min_{\phi}L_{\text{out}}(\theta^*(\phi),\phi) \] where, \[ \theta^*(\phi)=\arg\min_{\theta}L_{\text{in}}(\theta,\phi) \] - **Calculation of hypergradient**: \[ \frac{dL_{\text{out}}}{d\phi}=\frac{\partial L_{\text{out}}}{\partial\theta}\frac{\partial\theta(\phi)}{\partial\phi}+\frac{\partial L_{\text{out}}}{\partial\phi} \] ### Conclusion: This paper provides researchers with a comprehensive guide by systematically reviewing and analyzing the application of gradient - based bi - level optimization methods in deep learning, helping them better apply this technology in practical research. At the same time, the paper also points out potential directions for future research, providing guidance for follow - up research.