Constrained Cross-Entropy Method for Safe Reinforcement Learning

Min Wen,Ufuk Topcu
DOI: https://doi.org/10.1109/tac.2020.3015931
IF: 6.549
2021-07-01
IEEE Transactions on Automatic Control
Abstract:We study a safe reinforcement learning problem, in which the constraints are defined as the expected cost over finite-length trajectories. We propose a constrained cross-entropy-based method to solve this problem. The key idea is to transform the original constrained optimization problem into an unconstrained one with a surrogate objective. The method explicitly tracks its performance with respect to constraint satisfaction and thus is well suited for safety-critical applications. We show that the asymptotic behavior of the proposed algorithm can be almost-surely described by that of an ordinary differential equation. Then, we give sufficient conditions on the properties of this differential equation for the convergence of the proposed algorithm. At last, we show the performance of the proposed algorithm in two simulation examples. In a constrained linear–quadratic regulator example, we observe that the algorithm converges to the global optimum with high probability. In a 2-D navigation example, we find that the algorithm effectively learns feasible policies without assumptions on the feasibility of initial policies, even with non-Markovian objective functions and constraint functions.
automation & control systems,engineering, electrical & electronic
What problem does this paper attempt to address?