Abstract:In the trial-and-error mechanism of reinforcement learning (RL), a notorious contradiction arises when we expect to learn a safe policy: how to learn a safe policy without enough data and prior model about the dangerous region? Existing methods mostly use the posterior penalty for dangerous actions, which means that the agent is not penalized until experiencing danger. This fact causes that the agent cannot learn a zero-violation policy even after convergence. Otherwise, it would not receive any penalty and lose the knowledge about danger. In this paper, we propose the safe set actor-critic (SSAC) algorithm, which confines the policy update using safety-oriented energy functions, or the safety indexes. The safety index is designed to increase rapidly for potentially dangerous actions, which allows us to locate the safe set on the action space, or the control safe set. Therefore, we can identify the dangerous actions prior to taking them, and further obtain a zero constraint-violation policy after convergence.We claim that we can learn the energy function in a model-free manner similar to learning a value function. By using the energy function transition as the constraint objective, we formulate a constrained RL problem. We prove that our Lagrangian-based solutions make sure that the learned policy will converge to the constrained optimum under some assumptions. The proposed algorithm is evaluated on both the complex simulation environments and a hardware-in-loop (HIL) experiment with a real controller from the autonomous vehicle. Experimental results suggest that the converged policy in all environments achieves zero constraint violation and comparable performance with model-based baselines.

A novel Z-function-based completely model-free reinforcement learning method to finite-horizon zero-sum game of nonlinear system

An efficient model‐free adaptive optimal control of continuous‐time nonlinear non‐zero‐sum games based on integral reinforcement learning with exploration

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Zero‐sum game for nonlinear multiagent systems with full‐state constraints

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Novel single-loop policy iteration for linear zero-sum games

Non‐zero‐sum games of discrete‐time Markov jump systems with unknown dynamics: An off‐policy reinforcement learning method

Relaxed Policy Iteration Algorithm for Nonlinear Zero-Sum Games with Application to H-infinity Control

Reinforcement Learning for Finite-Horizon H∞ Tracking Control of Unknown Discrete Linear Time-Varying System

Robust policy iteration for continuous-time stochastic $H_\infty$ control problem with unknown dynamics

Linear-quadratic zero-sum mean-field type games: Optimality conditions and policy optimization

Two‐loop reinforcement learning algorithm for finite‐horizon optimal control of continuous‐time affine nonlinear systems

Policy Iteration Reinforcement Learning Method for Continuous-Time Linear-Quadratic Mean-Field Control Problems

Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning

Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate

Deep Reinforcement Learning for Infinite Horizon Mean Field Problems in Continuous Spaces

Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning.

Reinforcement Leaning for Infinite-Dimensional Systems

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

Learn Zero-Constraint-Violation Policy in Model-Free Constrained Reinforcement Learning

A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games