Effects of sampling and horizon in predictive reinforcement learning

Pavel Osinenko,Dmitrii Dobriborsci
DOI: https://doi.org/10.48550/arXiv.2108.04802
2021-08-24
Abstract:Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation, unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to achieve acceptable functionality. This is in contrast to classical control algorithms which are typically model-based. An direction of research is the fusion of RL with such algorithms, especially model-predictive control (MPC). This, however, introduces new hyper-parameters related to the prediction horizon. Furthermore, RL is usually concerned with Markov decision processes. But the most of the real environments are not time-discrete. The factual physical setting of RL consists of a digital agent and a time-continuous dynamical system. There is thus, in fact, yet another hyper-parameter -- the agent sampling time. In this paper, we investigate the effects of prediction horizon and sampling of two hybrid RL-MPC-agents in a case study with a mobile robot parking, which is in turn a canonical control problem. We benchmark the agents with a simple variant of MPC. The sampling showed a kind of a "sweet spot" behavior, whereas the RL agents demonstrated merits at shorter horizons.
Dynamical Systems,Systems and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the influence of sampling time and prediction horizon on the performance of hybrid RL - MPC (Model Predictive Control) agents in Predictive Reinforcement Learning (PRL). Specifically, the paper explores the following points: 1. **Sampling Time**: In the context of digital agents interacting with continuous - time dynamic systems, study the influence of different sampling times on system performance. Too low a sampling time may lead to performance degradation because the prediction is too short - sighted; while too high a sampling time may cause performance deterioration due to the accumulation of prediction errors. 2. **Prediction Horizon**: Research how different lengths of prediction horizons affect the learning effect and control performance of RL agents. A longer prediction horizon can provide more long - term information, which is helpful to improve the stability and performance of the system, but at the same time it will increase the computational complexity. 3. **Prediction Step Size**: Explore how changes in prediction step size affect the performance of the system, especially in RL algorithms combined with prediction mechanisms. The paper evaluates the influence of these parameters on three prediction control methods (MPC, roll - out Q - learning and stacked Q - learning) through a specific case study - the mobile robot parking problem. The experimental results show that different sampling time and prediction horizon settings have a significant impact on the performance of different methods, especially stacked Q - learning performs better than other methods under a shorter prediction horizon. ### Main Findings - **Sampling Time**: Too low a sampling time will lead to performance degradation because the controller is too short - sighted. As the sampling time increases, the performance will first improve, but after exceeding a certain point, the prediction error begins to dominate and the performance drops again. - **Prediction Horizon**: Increasing the length of the prediction horizon usually improves the performance of all methods, but the computational cost will also increase accordingly. - **Method Comparison**: Under a shorter prediction horizon, stacked Q - learning performs better than MPC and roll - out Q - learning, especially in terms of the number of successful parkings. ### Conclusion The paper proves through experiments that Predictive Reinforcement Learning (especially stacked Q - learning) can provide better performance than traditional MPC under a shorter prediction horizon. This provides new ideas for integrating classical control theory and reinforcement learning in practical industrial applications.