Abstract:Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which typically do not hold for neural networks, such as Long short-term memory (LSTM) models and Transformers. In this paper, we study a more general and realistic class of generalized $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We revisit and analyze the fundamental multiple gradient descent algorithm (MGDA) and its stochastic version with double sampling for solving the generalized $\ell$-smooth MOO problems, which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of these algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. We prove that they can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we analyze an efficient variant of MGDA named MGDA-FA using only $\mathcal{O}(1)$ time and space, while achieving the same performance guarantee as MGDA.

A Modified Oja–Xu MCA Learning Algorithm and Its Convergence Analysis

Convergence Analysis of Graph Regularized Non-Negative Matrix Factorization

Modified Nonlinear Adaptive Observer Based on Strong Tracking Filter

Convergence of Oja's online principal component flow

Convergence Analysis of the Modified Frequency-Domain Block LMS Algorithm with Guaranteed Optimal Steady State Performance

On the convergence analysis of the transform domain normalized LMS and related M-estimate algorithms

On the Optimal Tradeoff Between Computational Efficiency and Generalizability of Oja's Algorithm

Global Convergence of Oja's Subspace Algorithm for Principal Component Extraction

Finite-Time Convergence Rates of Decentralized Local Markovian Stochastic Approximation

MGDA Converges under Generalized Smoothness, Provably

A Mixed Evolutionary Algorithm to Solve the O-D Matrix Estimation Problem

Convergence Analysis of Generalized Nonlinear Inexact Uzawa Algorithm for Stabilized Saddle Point Problems

A convergence analysis of the method of codifferential descent

Optimization-Derived Learning with Essential Convergence Analysis of Training and Hyper-training

Diffusion approximations of Oja's online principal component analysis

Optimal Variable Step-Size Lms Model and Algorithm with Independence Assumption

The Learning Convergence of CMAC in Cyclic Learning

Unified Convergence Analysis for Adaptive Optimization with Moving Average Estimator

Convergence Analysis of the Alternating Anderson-Picard Method for Nonlinear Fixed-point Problems

On the Global Convergence of Majorization Minimization Algorithms for Nonconvex Optimization Problems.

Comments and CorrectionsConvergence Analysis on Trace Ratio Linear Discriminant Analysis Algorithms.