Linearization Algorithms for Fully Composite Optimization

Maria-Luiza Vladarean,Nikita Doikov,Martin Jaggi,Nicolas Flammarion
DOI: https://doi.org/10.48550/arXiv.2302.12808
2023-07-12
Abstract:This paper studies first-order algorithms for solving fully composite optimization problems over convex and compact sets. We leverage the structure of the objective by handling its differentiable and non-differentiable components separately, linearizing only the smooth parts. This provides us with new generalizations of the classical Frank-Wolfe method and the Conditional Gradient Sliding algorithm, that cater to a subclass of non-differentiable problems. Our algorithms rely on a stronger version of the linear minimization oracle, which can be efficiently implemented in several practical applications. We provide the basic version of our method with an affine-invariant analysis and prove global convergence rates for both convex and non-convex objectives. Furthermore, in the convex case, we propose an accelerated method with correspondingly improved complexity. Finally, we provide illustrative experiments to support our theoretical results.
Optimization and Control,Machine Learning
What problem does this paper attempt to address?
This paper aims to solve a class of fully composite optimization problems, which are in the form of: \[ \min_{x \in X} \phi(x) \triangleq F(f(x), x) \] where \(X\) is a convex and compact set, \(F: \mathbb{R}^n\times X\rightarrow\mathbb{R}\) is a possibly non - smooth simple convex function, and \(f: X\rightarrow\mathbb{R}^n\) is a smooth mapping and the main source of computational burden. This kind of problems widely exists in applications and generalizes many classical composite optimization use cases. The paper proposes a new method. By taking advantage of the different smooth and non - smooth parts of the objective function and only linearizing the smooth part, it provides a new generalization of the classical Frank - Wolfe method and the Conditional Gradient Sliding algorithm, which is applicable to a class of non - smooth problems. This method relies on a stronger version of the linear minimization oracle and can be efficiently implemented in many practical applications. Specifically, the main contributions of the paper include: - Proposing a basic method to solve the above problems. This method has affine invariance and provides a global convergence rate analysis. The convergence rate is \(O(1/k)\) for convex objective functions and \(\tilde{O}(1/\sqrt{k})\) for non - convex objective functions. - Proposing an accelerated method. In the convex case, it achieves a convergence rate of \(O(1/k^2)\), and the algorithm achieves the optimal \(O(\epsilon^{- 1/2})\) oracle complexity in terms of the number of times of computing \(\nabla f\). - Verifying the effectiveness of the proposed methods through numerical experiments. In general, this paper attempts to effectively solve non - smooth and non - convex optimization problems with specific structures by proposing a new algorithm framework, especially those problems whose solution process can be simplified by linearizing the smooth part. These methods not only have good convergence properties in theory but also show high efficiency in practical applications.