Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Luka Antonyshyn,Sidney Givigi

DOI: https://doi.org/10.1007/s10846-024-02118-y

2024-07-11

Journal of Intelligent and Robotic Systems: Theory and Applications

Abstract:Sparse rewards and sample efficiency are open areas of research in the field of reinforcement learning. These problems are especially important when considering applications of reinforcement learning to robotics and other cyber-physical systems. This is so because in these domains many tasks are goal-based and naturally expressed with binary successes and failures, action spaces are large and continuous, and real interactions with the environment are limited. In this work, we propose Deep Value-and-Predictive-Model Control (DVPMC), a model-based predictive reinforcement learning algorithm for continuous control that uses system identification, value function approximation and sampling-based optimization to select actions. The algorithm is evaluated on a dense reward and a sparse reward task. We show that it can match the performance of a predictive control approach to the dense reward problem, and outperforms model-free and model-based learning algorithms on the sparse reward task on the metrics of sample efficiency and performance. We verify the performance of an agent trained in simulation using DVPMC on a real robot playing the reach-avoid game. Video of the experiment can be found here: https://youtu.be/0Q274kcfn4c.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on two difficult problems in the field of reinforcement learning, namely sparse rewards and sample efficiency, especially in applications in robot control and other cyber - physical systems. Specifically: 1. **Sparse Rewards**: In many practical tasks, especially goal - oriented tasks, the reward signal is usually sparse, that is, a reward is only obtained when a specific goal is achieved. In this setting, it is difficult for agents to find effective strategies through exploration because there is a lack of frequent feedback to guide their behavior selection. 2. **Sample Efficiency**: Sample efficiency refers to how many interactions an agent needs to have with the environment before learning an effective strategy. In practical applications such as robot control, due to the limitations of physical systems, the number of experiments that can be carried out is usually limited. Therefore, improving sample efficiency is crucial for achieving rapid learning. To solve the above problems, the paper proposes a prediction - based control algorithm for deep models - Deep Value - and - Predictive - Model Control (DVPMC). This algorithm combines system identification, value function approximation, and sampling optimization methods, and can effectively handle sparse and dense reward problems with a small number of environmental interactions in continuous - control tasks. The paper verifies the performance of DVPMC on simulated and real robots through experiments, demonstrating its superiority in terms of sample efficiency and performance.

Deep Model-Based Reinforcement Learning for Predictive Control of Robotic Systems with Dense and Sparse Rewards

Model-Based Robot Learning Control with Uncertainty Directed Exploration

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

Learning Sparse Control Tasks from Pixels by Latent Nearest-Neighbor-Guided Explorations

Active Predicting Coding: Brain-Inspired Reinforcement Learning for Sparse Reward Robotic Control Problems

Actor-Critic Model Predictive Control

Model Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Deep Exploration with PAC-Bayes

Deep Value Model Predictive Control

Deep Model Predictive Optimization

Model-Based Control with Sparse Neural Dynamics

Practical Reinforcement Learning For MPC: Learning from sparse objectives in under an hour on a real robot

Robotic Control in Adversarial and Sparse Reward Environments: A Robust Goal-Conditioned Reinforcement Learning Approach

An experimental study of two predictive reinforcement learning methods and comparison with model-predictive control

Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control

DR-MPC: Deep Residual Model Predictive Control for Real-world Social Navigation

Deep Incremental Model Based Reinforcement Learning: A One-Step Lookback Approach for Continuous Robotics Control

Enhancing Task Performance of Learned Simplified Models via Reinforcement Learning

Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning

Model predictive control-based value estimation for efficient reinforcement learning