Abstract:We consider the problem of minimizing a differentiable function with locally Lipschitz continuous gradient on a stratified set and present a first-order algorithm designed to find a stationary point of that problem. Our assumptions on the stratified set are satisfied notably by the determinantal variety (i.e., matrices of bounded rank), its intersection with the cone of positive-semidefinite matrices, and the set of nonnegative sparse vectors. The iteration map of the proposed algorithm applies a step of projected-projected gradient descent with backtracking line search, as proposed by Schneider and Uschmajew (2015), to its input but also to a projection of the input onto each of the lower strata to which it is considered close, and outputs a point among those thereby produced that maximally reduces the cost function. Under our assumptions on the stratified set, we prove that this algorithm produces a sequence whose accumulation points are stationary, and therefore does not follow the so-called apocalypses described by Levin, Kileel, and Boumal (2022). We illustrate the apocalypse-free property of our method through a numerical experiment on the determinantal variety.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to minimize a differentiable function with a locally Lipschitz continuous gradient on a stratified set. Specifically, the paper focuses on finding the stable points of this optimization problem on specific stratified sets (such as determinant varieties, intersections of positive semidefinite matrix cones, and non - negative sparse vector sets). The paper proposes a first - order algorithm (P2GDR), aiming to overcome the "doomsday" phenomenon (i.e., the situation where the algorithm may converge to non - stable points) that existing methods may encounter, and proves that under appropriate assumptions, all the cluster points of the sequence generated by this algorithm are stable points. In addition, the paper shows the effectiveness of P2GDR in dealing with problems on determinant varieties through numerical experiments, proving that it can avoid the "doomsday" phenomenon. ### Key problems 1. **Problem definition**: - Given a Euclidean vector space \( E \), the inner product is denoted as \( \langle \cdot, \cdot \rangle \), and the induced norm is denoted as \( \|\cdot\| \). - Consider a differentiable function \( f: E \to \mathbb{R} \) whose gradient is locally Lipschitz continuous. - Consider a non - empty closed subset \( C \subseteq E \). - The goal is to minimize the value of \( f \) on \( C \), that is, to solve the problem: \[ \min_{x \in C} f(x) \tag{1} \] 2. **Definition of stable points**: - A point \( x \in C \) is called a stable point of problem (1) if it satisfies one of the following equivalent conditions: 1. \( \langle \nabla f(x), v \rangle \geq 0 \) for all \( v \in T_C(x) \), where \( T_C(x) \) represents the tangent cone of \( C \) at \( x \). 2. \( -\nabla f(x) \in \hat{N}_C(x) \), where \( \hat{N}_C(x) \) represents the regular normal cone of \( C \) at \( x \). 3. \( s(x; f, C)=0 \), where \( s(\cdot; f, C) \) is the stability measure function, defined as: \[ s(x; f, C)=\| P_{T_C(x)}(-\nabla f(x)) \| \tag{2} \] 3. **Limitations of existing methods**: - Existing first - order methods (such as the projected gradient descent method PGD) may converge to non - stable points in some cases, especially in the presence of "doomsday" points. - A "doomsday" point refers to a point \( x \) where there exists a sequence \( (x_i)_{i \in \mathbb{N}} \) converging to \( x \), and \( s(x_i; \phi, C) \to 0 \), but \( s(x; \phi, C)>0 \). 4. **Contributions of the paper**: - Proposes a new algorithm P2GDR, which can find stable points on stratified sets and avoid the "doomsday" phenomenon. - Proves that under appropriate assumptions, all the cluster points of the sequence generated by P2GDR are stable points. - Verifies the effectiveness of P2GDR through numerical experiments, especially when dealing with problems on determinant varieties. ### Conclusion The paper proposes a new optimization algorithm P2GDR, specifically for optimization problems on stratified sets. This algorithm can not only find stable points but also avoid the "doomsday" phenomenon that existing methods may encounter. The paper verifies the effectiveness and robustness of P2GDR through theoretical analysis and numerical experiments.

First-order optimization on stratified sets

Low-rank optimization methods based on projected-projected gradient descent that accumulate at Bouligand stationary points

On high-order multilevel optimization strategies

First-Order Algorithms Without Lipschitz Gradient: A Sequential Local Optimization Approach

O(log T) Projections for Stochastic Optimization of Smooth and Strongly Convex Functions

First-Order Methods for Nonsmooth Nonconvex Functional Constrained Optimization with or without Slater Points

Zeroth-order Gradient and Quasi-Newton Methods for Nonsmooth Nonconvex Stochastic Optimization

Optimization with First Order Algorithms

Optimal First-Order Algorithms as a Function of Inequalities

Accelerated First-Order Optimization under Nonlinear Constraints

First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

Near-Optimal Fully First-Order Algorithms for Finding Stationary Points in Bilevel Optimization

An interior proximal gradient method for nonconvex optimization

Minimization Over the Nonconvex Sparsity Constraint Using A Hybrid First-order method

An Alternating Structure-Adapted Bregman Proximal Gradient Descent Algorithm for Constrained Nonconvex Nonsmooth Optimization Problems and Its Inertial Variant.

Constrained, Global Optimization of Functions with Lipschitz Continuous Gradients

Optimizing $(L_0, L_1)$-Smooth Functions by Gradient Methods

Goldstein Stationarity in Lipschitz Constrained Optimization

An Infeasible-Point Subgradient Method Using Adaptive Approximate Projections

Optimization over bounded-rank matrices through a desingularization enables joint global and local guarantees