Abstract:We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods by maximizing the gradient norm on a level-set of the objective function. Since the descent lemma implies that gradient descent (GD) decreases the objective proportional to the squared norm of the gradient, level-set teleportation maximizes this one-step progress guarantee. For convex functions satisfying Hessian stability, we prove that GD with level-set teleportation obtains a combined sub-linear/linear convergence rate which is strictly faster than standard GD when the optimality gap is small. This is in sharp contrast to the standard (strongly) convex setting, where we show level-set teleportation neither improves nor worsens convergence rates. To evaluate teleportation in practice, we develop a projected-gradient-type method requiring only Hessian-vector products. We use this method to show that gradient methods with access to a teleportation oracle uniformly out-perform their standard versions on a variety of learning problems.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to accelerate the convergence speed of the gradient descent method by optimizing the sub - routine "Level Set Teleportation". Specifically, Level Set Teleportation aims to improve the single - step progress guarantee by maximizing the gradient norm on the level set of the objective function. For convex functions satisfying Hessian stability, the paper proves that the gradient descent with Level Set Teleportation can obtain a sub - linear/linear convergence rate faster than the standard gradient descent. In addition, the paper also develops a projected gradient - type method that only requires Hessian - vector products to solve the Level Set Teleportation problem, and experimentally demonstrates the performance improvement of the gradient method with teleportation in various learning tasks. ### Main contributions of the paper: 1. **Theoretical analysis**: - It is proved that for strongly convex functions, the gradient descent with Level Set Teleportation cannot be faster than the standard gradient descent in the worst - case scenario, unless an adaptive step size is used. - For convex functions satisfying Hessian stability, a new proof technique is provided, which combines the sub - linear progress of the standard gradient descent and the linear progress after teleportation, thereby obtaining a convergence rate strictly faster than \(O(1/K)\). 2. **Algorithm development**: - An algorithm that is fast and does not require parameter tuning is developed to accurately solve the Level Set Teleportation problem. This algorithm is based on sequential quadratic programming (SQP), only requires Hessian - vector products, and is implemented by linearizing the level set constraints. 3. **Experimental verification**: - The effectiveness of Level Set Teleportation is verified through multiple experiments, including two - layer MLP training on the MNIST dataset, three - layer ReLU network training on the UCI dataset, etc. The experimental results show that the optimization method with teleportation is superior to the standard method in terms of convergence speed and calculation of high - precision solutions. ### Key formulas: - **Single - step progress guarantee of gradient descent**: \[ f(w_{k + 1}) \leq f(w_k)-\eta_k\left(1-\frac{\eta_kL}{2}\right)\|\nabla f(w_k)\|^2_2 \] where \(\eta_k < \frac{2}{L}\). - **Level Set Teleportation problem**: \[ w^+_k\in\arg\max_w\frac{1}{2}\|\nabla f(w)\|^2_2\quad\text{s.t.}\quad f(w)\leq f(w_k) \] - **Linear progress with Hessian stability**: \[ \delta_{k + 1}\leq\left(1-\frac{2\tilde{\mu}\lambda_k\eta}{\eta\lambda_k\tilde{L}/2 - 1}\right)\delta_k \] ### Conclusion: Through theoretical analysis and experimental verification, the paper shows the potential of Level Set Teleportation in optimization problems. Especially when the objective function satisfies Hessian stability, it can significantly accelerate the convergence speed of the gradient descent method. This provides new ideas and tools for the design of optimization algorithms.

Level Set Teleportation: An Optimization Perspective

Symmetry Teleportation for Accelerated Optimization

Improving Convergence and Generalization Using Parameter Symmetries

A Variational Approach on Level sets and Linear Convergence of Variable Bregman Proximal Gradient Method for Nonconvex Optimization Problems

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

The Role of Level-Set Geometry on the Performance of PDHG for Conic Linear Optimization

Near-optimal tensor methods for minimizing the gradient norm of convex functions and accelerated primal-dual tensor methods

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

Gradient Descent in the Absence of Global Lipschitz Continuity of the Gradients

Multi-consensus Decentralized Accelerated Gradient Descent

Adaptive Mirror Descent Bilevel Optimization

Bringing regularized optimal transport to lightspeed: a splitting method adapted for GPUs

Beyond Convexity: Stochastic Quasi-Convex Optimization

Near-optimal tensor methods for minimizing the gradient norm of convex functions and accelerated primal–dual tensor methods

Accelerated Objective Gap and Gradient Norm Convergence for Gradient Descent via Long Steps

Mirror and Preconditioned Gradient Descent in Wasserstein Space

Provably Faster Gradient Descent via Long Steps

Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization

Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization

On Faster Convergence of Scaled Sign Gradient Descent

Accelerated Gradient Descent via Long Steps