Online Learning with Unknown Constraints

Karthik Sridharan,Seung Won Wilson Yoo
2024-03-07
Abstract:We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight while simultaneously satisfying the safety constraint with high probability on each round. We provide a general meta-algorithm that leverages an online regression oracle to estimate the unknown safety constraint, and converts the predictions of an online learning oracle to predictions that adhere to the unknown safety constraint. On the theoretical side, our algorithm's regret can be bounded by the regret of the online regression and online learning oracles, the eluder dimension of the model class containing the unknown safety constraint, and a novel complexity measure that captures the difficulty of safe learning. We complement our result with an asymptotic lower bound that shows that the aforementioned complexity measure is necessary. When the constraints are linear, we instantiate our result to provide a concrete algorithm with $\sqrt{T}$ regret using a scaling transformation that balances optimistic exploration with pessimistic constraint satisfaction.
Machine Learning,Artificial Intelligence,Statistics Theory
What problem does this paper attempt to address?
The paper primarily aims to address the problem of unknown constraints in online learning. Specifically, the researchers consider a scenario where the actions chosen by the learner at each time step must satisfy an unknown safety constraint, and the goal is to minimize regret compared to the best safe action while satisfying the safety constraint with high probability at each time step. To achieve this goal, the paper proposes a new safe learning algorithm that leverages an online regression oracle to estimate the unknown safety constraints and transforms the predictions of the online learning oracle into predictions that satisfy the unknown safety constraints. The algorithm can handle adversarial contexts, arbitrary action sets, and model classes, and theoretically, the regret of the algorithm can be bounded by the regret of the online regression oracle and the online learning oracle, the eluder dimension of the model class, and a novel complexity measure. Additionally, the paper introduces a complexity measure that captures the inherent tension between regret minimization and information acquisition about the unknown constraints at each step. Through an asymptotic lower bound analysis, the authors show that when this complexity measure is large, no safe algorithm can achieve diminishing regret. For linear and generalized linear settings, the paper provides explicit algorithms with a \(\sqrt{T}\) regret bound, which significantly improves upon the previous best result of \(O(T^{2/3})\). In summary, the key contributions of the paper include: 1. Proposing a safe learning algorithm for unknown constraints that utilizes online regression and online learning oracles. 2. Introducing a new complexity measure that precisely captures the balance between regret minimization and information acquisition at each step. 3. Providing a simple algorithm for linear and generalized linear settings with a \(\sqrt{T}\) regret bound, which is better than previous results. 4. Offering an asymptotic lower bound analysis that highlights the importance of the proposed complexity measure.