A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

Feiyang Ye,Baijiong Lin,Xiaofeng Cao,Yu Zhang,Ivor Tsang
2024-07-10
Abstract:In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization. Existing gradient-based MOBLO algorithms need to compute the Hessian matrix, causing the computational inefficient problem. To address this, we propose an efficient first-order multi-gradient method for MOBLO, called FORUM. Specifically, we reformulate MOBLO problems as a constrained multi-objective optimization (MOO) problem via the value-function approach. Then we propose a novel multi-gradient aggregation method to solve the challenging constrained MOO problem. Theoretically, we provide the complexity analysis to show the efficiency of the proposed method and a non-asymptotic convergence result. Empirically, extensive experiments demonstrate the effectiveness and efficiency of the proposed FORUM method in different learning problems. In particular, it achieves state-of-the-art performance on three multi-task learning benchmark datasets. The code is available at <a class="link-external link-https" href="https://github.com/Baijiong-Lin/FORUM" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper studies the multi-objective bilevel optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem, and the lower-level subproblem is a scalar optimization problem. Existing gradient-based MOBLO algorithms require the computation of the Hessian matrix, leading to low computational efficiency. To address this issue, the authors propose an efficient first-order multi-gradient-based method called FORUM. ### Main Contributions 1. **Proposed the FORUM method**: An efficient gradient-based MOBLO algorithm. 2. **Theoretical Analysis**: Demonstrated the efficiency of FORUM in terms of time and memory costs through complexity analysis and provided non-asymptotic convergence analysis. 3. **Experimental Validation**: Verified the effectiveness and efficiency of FORUM in multiple learning tasks, achieving state-of-the-art performance on three multi-task learning benchmark datasets. ### Method Overview #### 1. Problem Redefinition The authors redefine the MOBLO problem as an equivalent constrained multi-objective optimization problem, transformed using the value function method: \[ \min_{\alpha \in \mathbb{R}^n, \omega \in \mathbb{R}^p} F(\alpha, \omega) \quad \text{s.t.} \quad f(\alpha, \omega) \leq f^*(\alpha) \] #### 2. Multi-Gradient Aggregation Method To solve the above constrained multi-objective optimization problem, the authors propose a multi-gradient aggregation method. Specifically, in the \( k \)-th iteration, assuming \( z_k \) is updated to \( z_{k+1} = z_k + \mu d_k \), where \( \mu \) is the step size, and \( d_k \) is the update direction. The goal is for \( d_k \) to simultaneously minimize the upper-level objective \( F(z) \) and the constraint function \( e_q(z) \). #### 3. Dynamic Constraint Optimization To ensure the constraint \( e_q(z) \leq 0 \) is satisfied, a dynamic \( \phi_k \) is introduced to adjust the constraint conditions during the optimization process. ### Experimental Results #### 1. Data Super-Cleaning Experiments were conducted on the MNIST and Fashion-MNIST datasets, showing that FORUM outperforms existing MOML and MoCo methods in terms of classification accuracy and F1 score. #### 2. Multi-Task Learning Experiments were conducted on three benchmark datasets: Office-31, NYUv2, and QM9. The results show that FORUM performs excellently in multi-task learning tasks, achieving the highest average classification accuracy on the Office-31 dataset. ### Conclusion The FORUM method proposed in this paper performs excellently in solving multi-objective bilevel optimization problems, not only theoretically exhibiting lower time and space complexity but also demonstrating good performance in practical applications.