Abstract:We introduce an online convex optimization algorithm which utilizes projected subgradient descent with optimal adaptive learning rates. Our method provides second-order minimax-optimal dynamic regret guarantee (i.e., dependent on the sum of squared subgradient norms) for a sequence of general convex functions, which may not have strong-convexity, smoothness, exp-concavity or even proper Lipschitz-continuity. The regret guarantee is against any comparator decision sequence with bounded path variation (i.e., sum of the distances between successive decisions). We generate the lower bound of the worst-case second-order dynamic regret by incorporating actual subgradient norms. We show that this lower bound matches with our regret guarantee within a constant factor, which makes our algorithm minimax optimal. We also derive the extension for learning in each decision coordinate individually. We demonstrate how to best preserve our regret guarantee in a truly online manner, when the bound on path variation of the comparator sequence grows in time or the feedback regarding such bound arrives partially as time goes on. We further build on our algorithm to eliminate the need of any knowledge on the comparator path variation, and provide minimax optimal second-order regret guarantees with no a priori information. Our approach can compete against all comparator sequences simultaneously (universally) in a minimax optimal manner, i.e., each regret guarantee depends on the respective comparator path variation. We discuss modifications to our approach which address complexity reductions for time, computation and memory. We further improve our results by making the regret guarantees also dependent on comparator sets' diameters in addition to the respective path variations.

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

Eluder-based Regret for Stochastic Contextual MDPs

Contextual Inverse Optimization: Offline and Online Learning

Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

Online Convex Optimization in Adversarial Markov Decision Processes

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

Refined Regret for Adversarial MDPs with Linear Function Approximation

Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

Online Markov Decision Processes with Non-Oblivious Strategic Adversary

Simple Regret Minimization for Contextual Bandits

On the Computational Efficiency of Adaptive and Dynamic Regret Minimization

Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs

Learning Efficiently Function Approximation for Contextual MDP

Second Order Bounds for Contextual Bandits with Function Approximation

Universal Online Convex Optimization with Minimax Optimal Second-Order Dynamic Regret

$\Sqrt{n}$-Regret for Learning in Markov Decision Processes with Function Approximation and Low Bellman Rank

Dynamic Regret of Online Markov Decision Processes

Contextual Decision-Making with Knapsacks Beyond the Worst Case