Abstract:In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the off-line training and the on-line play algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method.

Practical Reinforcement Learning of Stabilizing Economic MPC

Safe Reinforcement Learning Using Robust MPC

Reinforcement Learning Based on Real-Time Iteration NMPC

Reinforcement learning based MPC with neural dynamical models

Reinforcement Learning for Mixed-Integer Problems Based on MPC

Reinforcement Learning for MPC: Fundamentals and Current Challenges

Reinforcement Learning-Based Model Predictive Control for Discrete-Time Systems.

Learning for MPC with stability & safety guarantees

A Safe Reinforcement Learning driven Weights-varying Model Predictive Control for Autonomous Vehicle Motion Control

AC4MPC: Actor-Critic Reinforcement Learning for Nonlinear Model Predictive Control

Practical Reinforcement Learning For MPC: Learning from sparse objectives in under an hour on a real robot

An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Learning-based MPC from Big Data Using Reinforcement Learning

Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

Data-Driven MPC for Nonlinear Systems with Reinforcement Learning

Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Model Predictive Control via On-Policy Imitation Learning

On the improvement of model-predictive controllers

Data-Driven MPC for Linear Systems Using Reinforcement Learning