Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

Lucy L. Gao,Jane J. Ye,Haian Yin,Shangzhi Zeng,Jin Zhang
2024-01-21
Abstract:Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves particularly powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. Examples of such scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper is mainly devoted to solving the optimization problems in bilevel programming (BP). Specifically, the paper proposes a single - level difference - of - weakly - convex (D - WC) reconstruction method based on the Moreau envelope function to deal with bilevel programming problems where the lower - level problem is convex only in the lower - level variables. #### Background and motivation 1. **Applications of bilevel programming**: - Bilevel programming is an important tool in machine learning, especially in hyperparameter selection. For example, in support vector machines (SVM) and regularized regression, bilevel programming can be used to optimize the performance of the model. 2. **Limitations of existing methods**: - Existing methods usually assume that the lower - level problem is convex in both upper - and lower - level variables. However, this assumption does not always hold in many practical applications, especially for hyperparameter tuning problems of certain prediction algorithms (such as kernel support vector machines). #### Main contributions of the paper 1. **New reconstruction method**: - The paper proposes a single - level difference - of - weakly - convex reconstruction method based on the Moreau envelope function, which only requires that the lower - level problem is convex in the lower - level variables. - The Moreau envelope function is a smoothing approximation method that can make the objective function smoother while keeping the minimum value unchanged. 2. **Algorithm design**: - Based on the above reconstruction method, the paper develops an inexact proximal difference of weakly convex algorithm (iP - DwCA) with sequential convergence. - This algorithm solves the strongly convex optimization sub - problems by linearizing the concave part of the difference - of - weakly - convex program and adding an approximation term. 3. **Theoretical analysis**: - The paper analyzes in detail the properties of the Moreau envelope function, including weak convexity and Lipschitz continuity. - It is further proved that iP - DwCA can converge to a "high - quality" solution, that is, the KKT point of the approximate problem, under appropriate conditions. 4. **Numerical experiments**: - To verify the effectiveness of the proposed method, the paper conducts numerical experiments, especially tests on the hyperparameter tuning problem of kernel support vector machines. ### Summary By introducing the difference - of - weakly - convex reconstruction method based on the Moreau envelope function, the paper expands the application range of bilevel programming, especially for those scenarios where the lower - level problem is convex only in the lower - level variables. In addition, the iP - DwCA algorithm proposed in the paper shows good performance both theoretically and in numerical experiments.