On Sampling Efficiency Optimization in Constrained Reinforcement Learning*

Qing-Shan Jia
DOI: https://doi.org/10.1109/aim55361.2024.10636978
2024-01-01
Abstract:Constrained reinforcement learning is of great practical interest due to the pervasive existence of constraints in applications. Beyond the typical constraints directly on the state space, simulation-based constraints are harder to address, due to noisy and time consuming evaluation on both the performance and the feasibility of a policy. We consider this important problem in this work and make two contributions. First, we develop an algorithm based on Q-learning that iteratively improves the performance and the feasibility of a policy and show its global convergence. Second, for online learning we develop an algorithm to control the sampling among the action space, which is shown to asymptotically maximize the probability of correctly selecting the best feasible action.
What problem does this paper attempt to address?