Abstract:This paper presents a model-free reinforcement learning (RL) algorithm to solve the risk-averse optimal control (RAOC) problem for discrete-time nonlinear systems. While successful RL algorithms have been presented to learn optimal control solutions under epistemic uncertainties (i.e., lack of knowledge of system dynamics), they do so by optimizing the expected utility of outcomes, which ignores the variance of cost under aleatory uncertainties (i.e., randomness). Performance-critical systems, however, must not only optimize the expected performance, but also reduce its variance to avoid performance fluctuation during RL's course of operation. To solve the RAOC problem, this paper presents the following three variants of RL algorithms and analyze their advantages and preferences for different situations/systems: 1) a one-shot static convex program -based RL, 2) an iterative value iteration (VI) algorithm that solves a linear programming (LP) optimization at each iteration, and 3) an iterative policy iteration (PI) algorithm that solves a convex optimization at each iteration and guarantees the stability of the consecutive control policies. Convergence of the exact optimization problems, which are infinite-dimensional in all three cases, to the optimal risk-averse value function is shown. To turn these optimization problems into standard optimization problems with finite decision variables and constraints, function approximation for value estimations as well as constraint sampling are leveraged. Data-driven implementations of these algorithms are provided based on Q-function which enables learning the optimal value without any knowledge of the system dynamics. The performance of the approximated solutions is also verified through a weighted sup-norm bound and the Lyapunov bound. A simulation example is provided to verify the effectiveness of the presented approach.

Off-Policy Risk-Sensitive Reinforcement Learning-Based Constrained Robust Optimal Control

Optimal Control for Constrained Discrete-Time Nonlinear Systems Based on Safe Reinforcement Learning.

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Off Policy Risk Sensitive Reinforcement Learning Based Optimal Tracking Control with Prescribe Performances

Off Policy Risk Sensitive Reinforcement Learning Based Optimal Tracking Control with Prescribe Performances

Optimal Robust Control of Nonlinear Uncertain System Via Off-Policy Integral Reinforcement Learning

Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

Data-Driven Robust Control of Discrete-Time Uncertain Linear Systems Via Off-Policy Reinforcement Learning.

Robust Safe Reinforcement Learning Control of Unknown Continuous-Time Nonlinear Systems with State Constraints and Disturbances

Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning

Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints

Reinforcement Learning-Based Control for Nonlinear Discrete-Time Systems with Unknown Control Directions and Control Constraints

Improved Off‐policy Reinforcement Learning Algorithm for Robust Control of Unmodeled Nonlinear System with Asymmetric State Constraints

Safety-Aware Optimal Control of Nonlinear Systems Using Off-Policy Reinforcement Learning*

Off‐policy reinforcement learning algorithm for robust optimal control of uncertain nonlinear systems

Reinforcement Learning Control of Constrained Dynamic Systems with Uniformly Ultimate Boundedness Stability Guarantee

Robust Control of Uncertain Linear Systems Based on Reinforcement Learning Principles.

A Convex Programming Approach to Data-Driven Risk-Averse Reinforcement Learning

Adaptive Optimal Control of Discrete-Time Linear Systems with Discounted Value: Off-Policy Reinforcement Learning

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

Adaptive Optimal Robust Control for Uncertain Nonlinear Systems Using Neural Network Approximation in Policy Iteration