Abstract:An algorithm is said to be adaptive to a certain parameter (of the problem) if it does not need a priori knowledge of such a parameter but performs competitively to those that know it. This dissertation presents our work on adaptive algorithms in following scenarios: 1. In the stochastic optimization setting, we only receive stochastic gradients and the level of noise in evaluating them greatly affects the convergence rate. Tuning is typically required when without prior knowledge of the noise scale in order to achieve the optimal rate. Considering this, we designed and analyzed noise-adaptive algorithms that can automatically ensure (near)-optimal rates under different noise scales without knowing it. 2. In training deep neural networks, the scales of gradient magnitudes in each coordinate can scatter across a very wide range unless normalization techniques, like BatchNorm, are employed. In such situations, algorithms not addressing this problem of gradient scales can behave very poorly. To mitigate this, we formally established the advantage of scale-free algorithms that adapt to the gradient scales and presented its real benefits in empirical experiments. 3. Traditional analyses in non-convex optimization typically rely on the smoothness assumption. Yet, this condition does not capture the properties of some deep learning objective functions, including the ones involving Long Short-Term Memory networks and Transformers. Instead, they satisfy a much more relaxed condition, with potentially unbounded smoothness. Under this condition, we show that a generalized SignSGD algorithm can theoretically match the best-known convergence rates obtained by SGD with gradient clipping but does not need explicit clipping at all, and it can empirically match the performance of Adam and beat others. Moreover, it can also be made to automatically adapt to the unknown relaxed smoothness.

Improving Adaptive Online Learning Using Refined Discretization

Adaptive Online Learning in Dynamic Environments.

Discounted Adaptive Online Learning: Towards Better Regularization

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization

Efficient Methods for Non-stationary Online Learning

Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach.

Minimizing Adaptive Regret with One Gradient Per Iteration

Gradient-Variation Online Learning under Generalized Smoothness

Adaptive Strategies in Non-convex Optimization

Revisiting Smoothed Online Learning

Universal Online Convex Optimization with Minimax Optimal Second-Order Dynamic Regret

Dual Adaptivity: A Universal Algorithm for Minimizing the Adaptive Regret of Convex Functions

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Online Alternating Direction Method (longer version)

Isotuning With Applications To Scale-Free Online Learning

Adaptive debiased SGD in high-dimensional GLMs with streaming data

Accelerated Gradient Algorithms with Adaptive Subspace Search for Instance-Faster Optimization

Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions

A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization

Online Stackelberg Optimization via Nonlinear Control

Strongly adaptive online learning over partial intervals