Abstract:We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper primarily investigates how to design optimal learning rate scheduling strategies to minimize dynamic regret in online learning with stochastic gradient descent (SGD) under continuously changing data distributions. Specifically, the paper explores the following issues: 1. **Linear Regression**: - How to design optimal learning rate scheduling strategies to minimize dynamic regret under time-varying coefficients. - Introducing new stochastic differential equations (SDE) to approximate the dynamic behavior of SGD under distribution changes and deriving the optimal learning rate through the analysis of these equations. 2. **General Convex Loss Functions**: - Proposing new learning rate scheduling strategies for general convex loss functions that are robust to distribution changes. - Providing upper and lower bounds for dynamic regret and proving that the difference between these bounds is only in the constant terms. 3. **Non-Convex Loss Functions**: - Defining a regret concept based on the model gradient norm and proposing a learning rate scheduling strategy to minimize the expected cumulative regret. - Validating the effectiveness of these learning rate scheduling strategies in high-dimensional regression models and neural networks through experiments. ### Main Contributions 1. **Linear Regression**: - Proposing a novel stochastic differential equation (SDE) to approximate the dynamic behavior of SGD under distribution changes. - Deriving the optimal learning rate scheduling strategy and validating its effectiveness through theoretical analysis and experiments. 2. **General Convex Loss Functions**: - Proposing adaptive learning rate scheduling strategies through the analysis of upper and lower bounds of dynamic regret. - Proving that for strongly convex loss functions, the proposed upper and lower bounds have the same form, differing only in constant terms. 3. **Non-Convex Loss Functions**: - Modifying the definition of regret by using the model gradient norm to measure performance. - Proposing a learning rate scheduling strategy to minimize the expected cumulative regret and validating its effectiveness through experiments. ### Experimental Validation - **High-Dimensional Regression Models**: Demonstrating the performance of different learning rate scheduling strategies in high-dimensional regression models through experiments. - **Medical Applications**: Using dynamic learning rate scheduling strategies to classify continuously arriving small RNA data in flow cytometry, validating the effectiveness of the methods. ### Summary Through theoretical analysis and experimental validation, this paper systematically studies how to design optimal learning rate scheduling strategies to minimize dynamic regret under continuously changing data distributions. These strategies are applicable not only to linear regression and convex loss functions but also to non-convex loss functions, providing important theoretical and practical guidance for online learning systems.

Learning Rate Schedules in the Presence of Distribution Shift

Risk-averse Learning with Non-Stationary Distributions

Optimal Learning Rates for Distribution Regression

Accelerated Rates between Stochastic and Adversarial Online Convex Optimization

Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient

Online distributed optimization with stochastic gradients: high probability bound of regrets

Distributionally Time-Varying Online Stochastic Optimization under Polyak-Łojasiewicz Condition with Application in Conditional Value-at-Risk Statistical Learning

Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization

Online Distributed Optimization with Clipped Stochastic Gradients: High Probability Bound of Regrets

Minimizing Adaptive Regret with One Gradient Per Iteration

Gradient-Variation Online Learning under Generalized Smoothness

Optimal Linear Decay Learning Rate Schedules and Further Refinements

Adaptive Online Learning in Dynamic Environments.

Minimax Regret Optimization for Robust Machine Learning under Distribution Shift

No-Regret Learnability for Piecewise Linear Losses

Optimal Margin Distribution Learning in Dynamic Environments

Fast Rates for the Regret of Offline Reinforcement Learning

Efficient Methods for Non-stationary Online Learning

Online Linear Regression in Dynamic Environments via Discounting

Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation