Abstract:Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using some specific adaptive learning rates. Thus, it is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrate the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our SUPER-ADAM algorithm can achieve the best known gradient (i.e., stochastic first-order oracle (SFO)) complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms. Code is available at <a class="link-external link-https" href="https://github.com/LIJUNYI95/SuperAdam" rel="external noopener nofollow">this https URL</a>

Efficient Adaptive Online Learning Via Frequent Directions

Accelerating Adaptive Online Learning by Matrix Approximation

Robust Frequent Directions with Application in Online Learning

Online Learning Via Regularized Frequent Directions.

Faster Projection-free Online Learning

A Full Adagrad algorithm with O(Nd) operations

Improving Adaptive Online Learning Using Refined Discretization

A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates

Faster Adaptive Decentralized Learning Algorithms

Online Alternating Direction Method (longer version)

Minimizing Adaptive Regret with One Gradient Per Iteration

Adaptive Online Learning in Dynamic Environments.

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Efficient Methods for Non-stationary Online Learning

Efficient Projection-Free Online Methods with Stochastic Recursive Gradient

Adaptive Gradient Methods with Dynamic Bound of Learning Rate.

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

AdaDB: an Adaptive Gradient Method with Data-Dependent Bound.

Finite-sum optimization: Adaptivity to smoothness and loopless variance reduction

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

ADADELTA: An Adaptive Learning Rate Method