Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Luka Antonyshyn,Sidney Givigi
DOI: https://doi.org/10.1007/s10846-024-02118-y
2024-07-11
Journal of Intelligent and Robotic Systems: Theory and Applications
Abstract:Sparse rewards and sample efficiency are open areas of research in the field of reinforcement learning. These problems are especially important when considering applications of reinforcement learning to robotics and other cyber-physical systems. This is so because in these domains many tasks are goal-based and naturally expressed with binary successes and failures, action spaces are large and continuous, and real interactions with the environment are limited. In this work, we propose Deep Value-and-Predictive-Model Control (DVPMC), a model-based predictive reinforcement learning algorithm for continuous control that uses system identification, value function approximation and sampling-based optimization to select actions. The algorithm is evaluated on a dense reward and a sparse reward task. We show that it can match the performance of a predictive control approach to the dense reward problem, and outperforms model-free and model-based learning algorithms on the sparse reward task on the metrics of sample efficiency and performance. We verify the performance of an agent trained in simulation using DVPMC on a real robot playing the reach-avoid game. Video of the experiment can be found here: https://youtu.be/0Q274kcfn4c.
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on two difficult problems in the field of reinforcement learning, namely sparse rewards and sample efficiency, especially in applications in robot control and other cyber - physical systems. Specifically: 1. **Sparse Rewards**: In many practical tasks, especially goal - oriented tasks, the reward signal is usually sparse, that is, a reward is only obtained when a specific goal is achieved. In this setting, it is difficult for agents to find effective strategies through exploration because there is a lack of frequent feedback to guide their behavior selection. 2. **Sample Efficiency**: Sample efficiency refers to how many interactions an agent needs to have with the environment before learning an effective strategy. In practical applications such as robot control, due to the limitations of physical systems, the number of experiments that can be carried out is usually limited. Therefore, improving sample efficiency is crucial for achieving rapid learning. To solve the above problems, the paper proposes a prediction - based control algorithm for deep models - Deep Value - and - Predictive - Model Control (DVPMC). This algorithm combines system identification, value function approximation, and sampling optimization methods, and can effectively handle sparse and dense reward problems with a small number of environmental interactions in continuous - control tasks. The paper verifies the performance of DVPMC on simulated and real robots through experiments, demonstrating its superiority in terms of sample efficiency and performance.