Abstract:Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to achieve acceptable functionality. This is in contrast to classical control algorithms which are typically model-based. An direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC). This, however, introduces new hyper-parameters related to the prediction horizon. Furthermore, RL is usually concerned with Markov decision processes. But the most of the real environments are not time-discrete. The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system. There is thus, in fact, yet another hyper-parameter -- the agent sampling time. In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC-agents in a case study with a mobile robot parking, which is in turn a canonical control problem. We benchmark the agents with a simple variant of MPC. The sampling showed a kind of a "sweet spot" behavior, whereas the RL agents demonstrated merits at shorter horizons.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the influence of sampling time and prediction horizon on the performance of hybrid RL - MPC (Model Predictive Control) agents in Predictive Reinforcement Learning (PRL). Specifically, the paper explores the following points: 1. **Sampling Time**: In the context of digital agents interacting with continuous - time dynamic systems, study the influence of different sampling times on system performance. Too low a sampling time may lead to performance degradation because the prediction is too short - sighted; while too high a sampling time may cause performance deterioration due to the accumulation of prediction errors. 2. **Prediction Horizon**: Research how different lengths of prediction horizons affect the learning effect and control performance of RL agents. A longer prediction horizon can provide more long - term information, which is helpful to improve the stability and performance of the system, but at the same time it will increase the computational complexity. 3. **Prediction Step Size**: Explore how changes in prediction step size affect the performance of the system, especially in RL algorithms combined with prediction mechanisms. The paper evaluates the influence of these parameters on three prediction control methods (MPC, roll - out Q - learning and stacked Q - learning) through a specific case study - the mobile robot parking problem. The experimental results show that different sampling time and prediction horizon settings have a significant impact on the performance of different methods, especially stacked Q - learning performs better than other methods under a shorter prediction horizon. ### Main Findings - **Sampling Time**: Too low a sampling time will lead to performance degradation because the controller is too short - sighted. As the sampling time increases, the performance will first improve, but after exceeding a certain point, the prediction error begins to dominate and the performance drops again. - **Prediction Horizon**: Increasing the length of the prediction horizon usually improves the performance of all methods, but the computational cost will also increase accordingly. - **Method Comparison**: Under a shorter prediction horizon, stacked Q - learning performs better than MPC and roll - out Q - learning, especially in terms of the number of successful parkings. ### Conclusion The paper proves through experiments that Predictive Reinforcement Learning (especially stacked Q - learning) can provide better performance than traditional MPC under a shorter prediction horizon. This provides new ideas for integrating classical control theory and reinforcement learning in practical industrial applications.

Effects of sampling and horizon in predictive reinforcement learning

Model-Based Robot Learning Control with Uncertainty Directed Exploration

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Model Predictive Control and Reinforcement Learning: A Unified Framework Based on Dynamic Programming

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

Learning Sampling Distributions for Model Predictive Control

Bridging RL Theory and Practice with the Effective Horizon

Estimation and Control Using Sampling-Based Bayesian Reinforcement Learning

On the Effective Horizon of Inverse Reinforcement Learning

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Experimental evaluations of model-based reinforcement learning combined with MPC

Incorporating Recurrent Reinforcement Learning into Model Predictive Control for Adaptive Control in Autonomous Driving

Actor-Critic Model Predictive Control

Neural Horizon Model Predictive Control -- Increasing Computational Efficiency with Neural Networks

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Model-based adaptation for sample efficient transfer in reinforcement learning control of parameter-varying systems

Learning-based MPC from Big Data Using Reinforcement Learning

Investigating Compounding Prediction Errors in Learned Dynamics Models