Abstract:In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in \citep{Hazan-2008-extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is how to constrain the regret value through the change of the cost function in Online Convex Optimization (OCO). Specifically, the authors hope to develop algorithms that can limit the regret value according to the amount of change in the cost function. This goal aims to improve existing methods so that the performance of learning algorithms is more robust when facing changing cost functions. ### Background and Motivation Online convex optimization is an iterative decision - making process, in which the decision - maker needs to select a decision vector \( \mathbf{x}_t \) in each round, and then receives a convex cost function \( c_t(\mathbf{x}) \) and bears the corresponding cost \( c_t(\mathbf{x}_t) \). The goal is to minimize the cumulative regret value, that is: \[ \text{regret} = \sum_{t = 1}^T c_t(\mathbf{x}_t)-\min_{\mathbf{x}\in P}\sum_{t = 1}^T c_t(\mathbf{x}) \] Previous research has mainly focused on constraining the regret value as a function of the number of trials \( T \), but this method performs poorly when facing cost functions with large changes. Therefore, this paper proposes a new idea: constraining the regret value through the amount of change in the cost function. ### Main Contributions 1. **Analysis of the Limitations of the FTRL Algorithm**: The authors first analyze the limitations of the Follow the Regularized Leader (FTRL) algorithm when applied to online convex optimization and point out that directly applying FTRL may not achieve the desired effect. In particular, when all cost functions are the same or change very little, the regret value of FTRL may still grow at a rate of \( O(\sqrt{T}) \). 2. **Introduction of Sequential Variation**: In order to better measure the change of the cost function, the authors introduce the sequential variation, which is defined as: \[ \text{VAR}_s^T=\sum_{t = 1}^{T - 1}\max_{\mathbf{x}\in P}\|\nabla c_{t + 1}(\mathbf{x})-\nabla c_t(\mathbf{x})\|_2^2 \] This definition more accurately reflects the change in the gradient of the cost function between adjacent rounds. 3. **Proposing Two New Algorithms**: - **Improved FTRL Algorithm**: By maintaining two sequences (decision vectors and search vectors) and using the extended sequential variation (EVAR_s^T) to constrain the regret value. - **Prox Method (Based on Mirror Approximation)**: Also maintain two sequences and achieve the constraint of the regret value through a similar mechanism. 4. **Theoretical Results**: The authors prove that the regret values of these two new algorithms can be constrained by the sequential variation, in the specific form of: \[ \sum_{t = 1}^T c_t(\mathbf{x}_t)-\min_{\mathbf{x}\in P}\sum_{t = 1}^T c_t(\mathbf{x})\leq O(\sqrt{\text{EVAR}_s^T})+\text{constant} \] ### Conclusions and Prospects This research proposes two new online convex optimization algorithms that can constrain the regret value based on the amount of change in the cost function. This not only improves the robustness of the algorithms when facing changing cost functions but also provides a new direction for further research. Future work can explore how to extend these algorithms to situations with partial feedback (such as bandit settings) and how to reduce the dependence of the regret value on the number of trials \( T \).

Regret Bound by Variation for Online Convex Optimization

On Online Optimization: Dynamic Regret Analysis of Strongly Convex and Smooth Problems

Online distributed optimization with stochastic gradients: high probability bound of regrets

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Dynamic Regret of Convex and Smooth Functions

Small-loss Adaptive Regret for Online Convex Optimization

A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

Improved Regret for Bandit Convex Optimization with Delayed Feedback

Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods

The Online Saddle Point Problem and Online Convex Optimization with Knapsacks

Adaptive Regret of Convex and Smooth Functions

Projection-Free Bandit Convex Optimization over Strongly Convex Sets

Second Order Methods for Bandit Optimization and Control

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

Risk-Averse No-Regret Learning in Online Convex Games

Online Convex Optimization with Memory and Limited Predictions

Universal Online Convex Optimization with Minimax Optimal Second-Order Dynamic Regret

Risk-Averse Stochastic Convex Bandit

Improved Regret Bounds for Online Kernel Selection under Bandit Feedback

Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization