Abstract:We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round. We provide a general meta-algorithm that leverages an online regression oracle to estimate the unknown safety constraint, and converts the predictions of an online learning oracle to predictions that adhere to the unknown safety constraint. On the theoretical side, our algorithm's regret can be bounded by the regret of the online regression and online learning oracles, the eluder dimension of the model class containing the unknown safety constraint, and a novel complexity measure that captures the difficulty of safe learning. We complement our result with an asymptotic lower bound that shows that the aforementioned complexity measure is necessary. When the constraints are linear, we instantiate our result to provide a concrete algorithm with $\sqrt{T}$ regret using a scaling transformation that balances optimistic exploration with pessimistic constraint satisfaction.

What problem does this paper attempt to address?

The paper primarily aims to address the problem of unknown constraints in online learning. Specifically, the researchers consider a scenario where the actions chosen by the learner at each time step must satisfy an unknown safety constraint, and the goal is to minimize regret compared to the best safe action while satisfying the safety constraint with high probability at each time step. To achieve this goal, the paper proposes a new safe learning algorithm that leverages an online regression oracle to estimate the unknown safety constraints and transforms the predictions of the online learning oracle into predictions that satisfy the unknown safety constraints. The algorithm can handle adversarial contexts, arbitrary action sets, and model classes, and theoretically, the regret of the algorithm can be bounded by the regret of the online regression oracle and the online learning oracle, the eluder dimension of the model class, and a novel complexity measure. Additionally, the paper introduces a complexity measure that captures the inherent tension between regret minimization and information acquisition about the unknown constraints at each step. Through an asymptotic lower bound analysis, the authors show that when this complexity measure is large, no safe algorithm can achieve diminishing regret. For linear and generalized linear settings, the paper provides explicit algorithms with a $\sqrt{T}$ regret bound, which significantly improves upon the previous best result of $O(T^{2/3})$. In summary, the key contributions of the paper include: 1. Proposing a safe learning algorithm for unknown constraints that utilizes online regression and online learning oracles. 2. Introducing a new complexity measure that precisely captures the balance between regret minimization and information acquisition at each step. 3. Providing a simple algorithm for linear and generalized linear settings with a $\sqrt{T}$ regret bound, which is better than previous results. 4. Offering an asymptotic lower bound analysis that highlights the importance of the proposed complexity measure.

Online Learning with Unknown Constraints

Online Learning: Stochastic and Constrained Adversaries

Optimistic Safety for Online Convex Optimization with Unknown Linear Constraints

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

Efficient Constrained Regret Minimization

Stronger Regret Bounds for Safe Online Reinforcement Learning in the Linear Quadratic Regulator

The Interplay Between Stability and Regret in Online Learning

Adaptive Online Learning in Dynamic Environments.

Safe Online Convex Optimization with Multi-Point Feedback

Online Learning with Sublinear Best-Action Queries

Online Control of Unknown Time-Varying Dynamical Systems

Online Learning under Adversarial Nonlinear Constraints

Online Non-stochastic Control with Partial Feedback

Online Stackelberg Optimization via Nonlinear Control

Avoiding Catastrophe in Online Learning by Asking for Help

Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning

Optimal Algorithms for Online Convex Optimization with Adversarial Constraints

Fully Unconstrained Online Learning

Online Learning with Primary and Secondary Losses

Oracle-Efficient Hybrid Online Learning with Unknown Distribution

Online Learning and Solving Infinite Games with an ERM Oracle