Level Set Teleportation: An Optimization Perspective

Aaron Mishkin,Alberto Bietti,Robert M. Gower
2024-03-06
Abstract:We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods by maximizing the gradient norm on a level-set of the objective function. Since the descent lemma implies that gradient descent (GD) decreases the objective proportional to the squared norm of the gradient, level-set teleportation maximizes this one-step progress guarantee. For convex functions satisfying Hessian stability, we prove that GD with level-set teleportation obtains a combined sub-linear/linear convergence rate which is strictly faster than standard GD when the optimality gap is small. This is in sharp contrast to the standard (strongly) convex setting, where we show level-set teleportation neither improves nor worsens convergence rates. To evaluate teleportation in practice, we develop a projected-gradient-type method requiring only Hessian-vector products. We use this method to show that gradient methods with access to a teleportation oracle uniformly out-perform their standard versions on a variety of learning problems.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to accelerate the convergence speed of the gradient descent method by optimizing the sub - routine "Level Set Teleportation". Specifically, Level Set Teleportation aims to improve the single - step progress guarantee by maximizing the gradient norm on the level set of the objective function. For convex functions satisfying Hessian stability, the paper proves that the gradient descent with Level Set Teleportation can obtain a sub - linear/linear convergence rate faster than the standard gradient descent. In addition, the paper also develops a projected gradient - type method that only requires Hessian - vector products to solve the Level Set Teleportation problem, and experimentally demonstrates the performance improvement of the gradient method with teleportation in various learning tasks. ### Main contributions of the paper: 1. **Theoretical analysis**: - It is proved that for strongly convex functions, the gradient descent with Level Set Teleportation cannot be faster than the standard gradient descent in the worst - case scenario, unless an adaptive step size is used. - For convex functions satisfying Hessian stability, a new proof technique is provided, which combines the sub - linear progress of the standard gradient descent and the linear progress after teleportation, thereby obtaining a convergence rate strictly faster than \(O(1/K)\). 2. **Algorithm development**: - An algorithm that is fast and does not require parameter tuning is developed to accurately solve the Level Set Teleportation problem. This algorithm is based on sequential quadratic programming (SQP), only requires Hessian - vector products, and is implemented by linearizing the level set constraints. 3. **Experimental verification**: - The effectiveness of Level Set Teleportation is verified through multiple experiments, including two - layer MLP training on the MNIST dataset, three - layer ReLU network training on the UCI dataset, etc. The experimental results show that the optimization method with teleportation is superior to the standard method in terms of convergence speed and calculation of high - precision solutions. ### Key formulas: - **Single - step progress guarantee of gradient descent**: \[ f(w_{k + 1}) \leq f(w_k)-\eta_k\left(1-\frac{\eta_kL}{2}\right)\|\nabla f(w_k)\|^2_2 \] where \(\eta_k < \frac{2}{L}\). - **Level Set Teleportation problem**: \[ w^+_k\in\arg\max_w\frac{1}{2}\|\nabla f(w)\|^2_2\quad\text{s.t.}\quad f(w)\leq f(w_k) \] - **Linear progress with Hessian stability**: \[ \delta_{k + 1}\leq\left(1-\frac{2\tilde{\mu}\lambda_k\eta}{\eta\lambda_k\tilde{L}/2 - 1}\right)\delta_k \] ### Conclusion: Through theoretical analysis and experimental verification, the paper shows the potential of Level Set Teleportation in optimization problems. Especially when the objective function satisfies Hessian stability, it can significantly accelerate the convergence speed of the gradient descent method. This provides new ideas and tools for the design of optimization algorithms.