Abstract:This paper presents a novel composite obstacle avoidance control method to generate safe motion trajectories for autonomous systems in an adaptive manner. First, system safety is described using forward invariance, and the barrier function is encoded into the cost function such that the obstacle avoidance problem can be characterized by an infinite-horizon optimal control problem. Next, a safe reinforcement learning framework is proposed by combining model-based policy iteration and state-following-based approximation. Upon real-time data and extrapolated experience data, this learning design is implemented through the actor-critic structure, in which critic networks are tuned by gradient-descent adaption and actor networks produce adaptive control policies via gradient projection. Then, system stability and weight convergence are theoretically analyzed using Lyapunov method. Finally, the proposed learning-based controller is demonstrated on a two-dimensional single integrator system and a nonlinear unicycle kinematic system. Simulation results reveal that the system or agent can smoothly reach the target point while keeping a safe distance from each obstacle; at the same time, other three avoidance control methods are used to provide side-by-side comparisons and to verify some claimed advantages of the present method. Note to Practitioners—This paper is motivated by the obstacle avoidance problem of real-time navigation of an agent to the target point, which applies to practical autonomous systems such as vehicles and robots. Pre-generative methods and reactive methods have been widely employed to generate safe motion trajectories in the obstacle environment. However, these methods cannot strike a good balance between safety and optimality. In this paper, the obstacle avoidance problem is formulated in the sense of optimal control, and a safe reinforcement learning method is designed to generate safe motion trajectories. This method combines the advantages of model-based policy iteration and state-following-based approximation, in which the former ensures regional optimality while the latter ensures local safety. Based on the proposed adaptive tuning laws, engineers are able to design learning-based avoidance controllers in the environment with static obstacles. In future research, we will address the dynamic avoidance problem against moving obstacles.

Safety-Aware Optimal Control of Nonlinear Systems Using Off-Policy Reinforcement Learning*

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Off-Policy Risk-Sensitive Reinforcement Learning-Based Constrained Robust Optimal Control

Off Policy Risk Sensitive Reinforcement Learning Based Optimal Tracking Control with Prescribe Performances

Safe adaptive output‐feedback optimal control of a class of linear systems

Optimal Robust Control of Nonlinear Uncertain System Via Off-Policy Integral Reinforcement Learning

Off Policy Risk Sensitive Reinforcement Learning Based Optimal Tracking Control with Prescribe Performances

Reinforcement Learning Controller Design for Affine Nonlinear Discrete-Time Systems Using Online Approximators

State and Input Constrained Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems

Improved Off‐policy Reinforcement Learning Algorithm for Robust Control of Unmodeled Nonlinear System with Asymmetric State Constraints

Online adaptive data-driven control for unknown nonlinear systems with constrained-input

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

Safe Intermittent Reinforcement Learning for Nonlinear Systems.

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

Off‐policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems

Safe Reinforcement Learning and Adaptive Optimal Control With Applications to Obstacle Avoidance Problem

Robust Safe Reinforcement Learning Control of Unknown Continuous-Time Nonlinear Systems with State Constraints and Disturbances

Model-Based Safe Reinforcement Learning with Time-Varying State and Control Constraints: An Application to Intelligent Vehicles

Safe Controller for Output Feedback Linear Systems using Model-Based Reinforcement Learning

Suboptimal Reduced Control of Unknown Nonlinear Singularly Perturbed Systems Via Reinforcement Learning